[an error occurred while processing this directive]
Analog recording continues to recede into the past as digital formats make capturing audio easier, more accurate and efficient. What are these formats, and what do you need to know about them?
In many ways digital audio is handled exactly as its analog counterpart was, whether recorded separately on a dedicated audio recorder or with video on videotape. Consideration number one is your budget. If pristine sound is critical to your project and you have the dollars to assign to this area, securing the services of a location sound recordist with dedicated gear is the way to go. These folks know how to "shoot" sound (the term comes directly from its visual counterpart), and you can pretty much forget about the quality aspect of your audio once you have a professional on board.
It Still Starts As Analog
Microphones are still analog devices, transferring sound waves into electrical energy. Even the new digital microphones still start with analog and do the analog-to-digital conversion within the microphone body. As the first link of the audio chain, microphones play a key role in the quality of the end product. In the past, videographers did not have to be overly concerned with providing the "very best" microphone quality. Analog recording and transmission degradation would eliminate any improvements they might experience from using higher-quality gear. With the advent of digital recording, post and transmission, this will no longer be the case. Keep in mind that digital offers videographers CD-quality sound that viewers will be expecting. Microphones should be of the highest quality available with wide frequency response and extremely low inherent self-noise to help ensure highly accurate sound reproduction.
Dedicated Sound
A sound recordist will ask you some questions. The answers will depend on the nature of your project, the importance of sound, and your budget. He or she will need to know how many microphones to bring on the shoot, and whether you will be using a mixing console, sending stereo pairs of your entire audio out to the inputs on a camcorder, or to tape recorders. This sending function, called bussing, is quite important. If your audio budget is limited, you may not have the luxury of sending your individual microphone outputs to multitrack tape recorders where it can be mixed at leisure in the post process. In this case, your sound recordist will execute an on-site mix and buss a stereo pair to a camcorder, two track recorder or DAT. Many mixing functions will be executed in the post process, but having a trained sound recordist working on location with a multiple mic set up and mixing console can be critical in insuring that the sound which makes it to post is clean and well balanced.
Videographer as Audiographer
Many documentaries, news shoots, and even some low-budget features use more modest audio set ups. You may decide to forego the services of an audio professional altogether and rely on the sound you get from the microphone on your camera. If you're shooting with a digital format camcorder, you're getting digital audio as well as picture. What do you need to know about recording digital sound in this case?
Actually very little, but one concern is of paramount importance: audio levels. Analog recording handles tape saturation or overmodulation--"hitting" tape at louder than desirable levels, in a relatively forgiving manner. Going too far into the red on your VU meter (recording at levels higher than 0 dB) will ultimately result in distortion, but the price you pay for not paying proper attention to the levels of your signal comes by degree. You'll almost have to be trying to end up with audio that is unusable because of distortion caused by improper levels on a shoot that uses analog tape. Automatic gain control and limiters help to keep analog audio levels where they should be.
Not so with any of the digital media, which include digital quarter-inch tape (found on the Nagra D--a very expensive machine used on most major film shoots), RDAT and digital mini disc recorders, and the digital audio portion of the tape you pop into your digital camcorder or VTR. Add to this list of digital recording devices the hard disk audio recorders that recently entered the market, and expect others to follow. All of the digital recording media offer clarity and an accuracy that far surpass analog technology. But watch out. There is danger lurking in the digital stream as well.
Digital Danger
Digital audio--this point cannot be underscored too heavily--distorts immediately when the 0 dB level is passed. This distortion is hideous and no recording that has it is usable. If you're familiar with audio terminology and want to impress the audiophiles on your crew, throw them a question like, "What is headroom and should you handle it differently when recording digitally instead of in the analog realm?"
Headroom refers to the difference, measured in decibels, between the sound recorded and the 0 dB mark. Analog tape has a hiss associated with it. As you get closer to the 0 dB point this unpleasant sound gets less noticeable, masked in effect by the signal being recorded. Consequently, engineers have made it their business to monitor the level of the incoming signal and make sure that the signal-to-noise ratio is as high as possible, always being careful to leave a little headroom, or extra decibel room, for the unexpected level burst that generally comes when you least expect it. Knowing how much headroom to leave is part of an engineer's skills.
Here's the good news. You can leave lots more headroom when you record audio digitally, thus minimizing the chance that you'll pass the dreaded 0 dB point. Why? Signal-to-noise is still a factor, especially considering the fact that your location sound will most certainly have to pass through analog to digital converters (microphones are still analog devices) which color the sound to one degree or another. Theoretically, then, you'd still want to get as close to the 0 dB point as possible, wouldn't you?
No, for several reasons. One is the lack of tape hiss associated with any of the digital media. Another point to consider: When your digital audio is taken to post it can be normalized. Sounds good, you say, especially when you know what it means.
Normalization
In a nutshell, normalization lets you take signals recorded at low levels and boost them until they come to within a fraction of the 0 dB mark. And there is absolutely no generational loss associated with this process, only zeros and ones rewritten so that they peak out just where you want them. This means that on your shoot you can give extra headroom to your sound recordings without worrying about the signal-to-noise ratio.
Specs
Perhaps you've heard of the debate within the record industry that surrounds digital recording. Many record producers still favor recording to analog multitrack tape claiming that it's "warmer" and has more of a "live" feel to it. The hoopla that attached to CDs when they first appeared was met with disdain by many within the recording business. Is the quality of digital recordings fixed or is it an evolving medium? Is it "better" than analog recording? What specs should the videographer be aware of?
CDs recorded today sound better than the early ones, although the frequency range is the same now as it was in the late 1980s for most recordings and all of the playback devices. The formula to remember here is that digital sound must be recorded, or sampled, at twice the frequency of the highest pitch you wish to capture. Leaving aside the issue of the uppermost harmonics--the highest elements of a ringing cymbal for example--that you perceive more than hear, human beings distinguish sounds up to about 20 kHz. The CDs you have are generally recorded at 44.1 kHz, giving an accurate representation of most all of the frequency spectrum that we can hear. Most field recorders record at 44.1 kHz, or even 48 kHz, but it doesn't hurt to ask about the unit you'll be working with.
As we speak a debate is taking place within the recording industry. Some audiophiles insist that sampling at 96 kHz--double the current specs--is necessary to achieve the full sound that humans can perceive. At this high sampling frequency, many contend that the "warmth" advantage of analog recording will disappear. In other words, these people feel that the slightly harsh sound that is sometimes associated with digital recordings results from our not getting an accurate representation of the full audio spectrum when sampling at 44.1 kHz or 48 kHz.
Others argue that such a spec, while desirable in theory, does not have the practical value that would justify the memory requirements that would be required in the recording studio and in the field. Television and video professionals shooting digital sound will be one of the constituent groups that ultimately help decide, with their pocketbooks, whether a 96 kHz specification is adopted.
A second spec that you will hear about is bit rate. It is not synonymous with "volume." The word volume is ubiquitous in the recording industry, but in fact it is neither an engineering nor an electrical term. Rather, the word has been borrowed as a measuring term, but unlike a quart or gallon it has no specific definition when used to describe sound.
What you need to know is that more bits in a digital medium mean greater detail in representing amplitude variation in your recording. If all we had, for example, were recordings that divided signals into two levels, VERY LOUD and really soft, the sound would be unrealistic and quite disturbing to our highly refined and cultured ears. This is not the case, of course, with any of the digital media. The lowest bit rate you'll find is 8, and the mathematical relationship we're dealing with in this area is always two to the power of n. In this case that means 2 to the 8th power, which yields the number 256. You might think that having 256 amplitude steps for every sample that you record would be sufficient, but our ears are quite sensitive, and 8-bit recordings actually sound rough and unrealistic to us.
Sixteen-bit is the current standard for recording and mastering records and films.
That means that every sample in a 16-bit recording may be resolved, or rounded off, to one of 65,536 discrete amplitude steps. 16-bit recordings sound quite good to most of us. Still, purists within the industry argue that 20-bit, or ultimately even 24-bit, technology needs to become standard before we have recordings that match the sensitivity of human hearing. As with high frequency sampling, purists argue that anything less than 20-bit or 24-bit recording yields results that are unsatisfactory when compared to analog recordings made under optimal conditions.
Recording Sound
Now that you know a little bit about the basics of recording digital sound, you'll be able to approach recording digital sound with some intelligence. If hiring a sound recordist is not in the cards for your current project, you'll probably be using a camcorder that has built-in digital audio. What makes and models are there for you to choose from?
Several years ago, some of the main players in the video industry agreed on a standard for their units. These manufacturers, which include Sony, Panasonic and Philips, have digital video recorders that offer digital audio as well. Typically, the specs on your digital audio channels on your professional digital camcorder are 48 kHz/16-bit. But beware, consumer formats, such as DV have lower specs or specs that are user selectable.
Regardless of your budget, there is certainly a digital recorder out there for you. It may be part of a digital camcorder, or it could be a separate digital audio recorder--a Nagra D, RDAT or mini disk recorder, or a digital multitrack unit such as the Tascam D-88 or Alesis ADAT. At some point your shoot will be over. If you've watched levels and not passed the 0 dB mark, you're in good shape. It's now time to for you to post audio.
By Gary Eskow
Now that your shoot is over, it's time to post audio. Here's where you'll take the sound you've recorded on location--whether analog or digital, clean it up, add music and sound effects where necessary and make sure that your final audio matches picture perfectly.
Before we examine the audio post process, there's one last element of location recording you need to know about--how the sound recorded during your shoot is referenced to timecode. As we mentioned in Digital Audio Production you may be working in a variety of ways. Budget is generally the factor that determines whether sound is recorded independently from picture. You may, in fact, be shooting audio directly onto a digital camcorder. In this case, both sound and picture will likely be cut together in post, and the audio that you've recorded on location will carry through the entire post procedure with no alterations. It is possible to separate digital sound from picture later on, but the cost is high, and if your budget allows, you'd be much better off recording audio to one of the other digital media formats discussed earlier.
Sync Timing
What do you need to know about timecode and how audio is synced to video in post? If you're shooting on film (a very viable HDTV acquisition format) your camera's internal crystal provides the locking mechanism. Timecode is added only when you have your dailies transferred to tape for editing purposes. Audio, on the other hand, is referenced to SMPTE timecode on location.
By the late 1960s videotape had replaced film in many broadcast applications. Tape has advantages over film--cost being one--but it posed a sync problem. Positional information is printed visibly along the edge of film stock, but videotape frames are recorded magnetically, and this information is not detectable. The Society of Motion Picture and Television Engineers devised a method of identifying frames and subframes that is commonly referred to as SMPTE. Practically speaking, how does SMPTE affect your audio work?
On a film shoot, each take will be slated with a clap stick. In video, a timecode generator sends out a SMPTE number stream to your digital tape recorder. The numbers that correspond to the beginning of each take can be determined by stopping the film at the clap and noting the SMPTE location. Later, when a new SMPTE stripe is laid against your dailies, an offset will be computed that allows for an exact match of picture and audio.
Let's assume that you're shooting on film and using a Nagra D to record digital audio. A SMPTE generator feeds timecode to the Nagra, and a clap stick marks the exact starting point of each take. Dailies are then transferred to 3/4-inch tape for editing with fresh SMPTE laid onto them. What happens if the first several takes of Scene One are duds and you don't bother to transfer them? In this case the first of your dailies, take three for example, will start with a SMPTE location of 01:01:00:00 (SMPTE is generally laid down with a starting point of one hour and one minute). The audio that goes with that picture will have entirely different numbers, say 01:04:22:11 as determined by checking the clap stick number for that take. Those numbers are part of an edit decision list (EDL) that the transfer house uses. They will compute the offset needed to start the audio at the exact point where the corresponding picture begins and transfer both elements so that they are in sync with each other.
If the transfer house does its job properly, the dailies that you view will have temporary audio that is locked to picture (sync sound). In the real world, transfer houses are sometimes less careful with audio than picture, so make sure that the sound you get is clean. Once you've determined which takes are keepers, it's time to get down to the business of posting audio.
Posting
Following the example that has you recording digital audio onto a Nagra D machine in the field, we now need to ask ourselves whether those original tracks will be used in post. Not likely. In general, when a transfer house ports over the sound from the master tape to a work print video (remember, we're only using the Nagra D as an example; timecoded RDAT machines are often used, and complicated shoots may require a multitrack format such as the Tascam DA-88), it will make a digital copy for the audio house, leaving the originals as safety masters. If a two-track format has been used, the digital clone will most often be a SMPTE encoded RDAT. The SMPTE on this tape will reference the numbers that are on the work print, not those found on the original audio tape. From here on out the EDL associated with picture is all that counts.
How has the digital revolution changed the way audio post is conducted? Radically, in a word. Digital Audio Workstations (see To DAW Or Not To DAW?) have made it possible to work faster and more efficiently than ever before. In general though, don't count on spending less time in audio post than you might have a decade ago. DAWs also let you experiment with sound in exciting ways, and when you start playing around with sound effects, foley and music, you may find it hard to stop.
What work stations are out there, and what do you need to know about them? The first work station adopted by the audio post community was built by a company called New England Digital. The Post Pro was sophisticated for its time, but by today's standards the power it possesses is relatively tame, and the hefty price associated with this work station relegated it to museum status when a music company decided to get into the game.
DigiDesign founder Peter Gotcher originally made chips for samplers--devices that play back digital recordings of sounds--that the music industry found quite appealing. Several years ago, DigiDesign came out with its first ProTools system. ProTools, which synchronizes multiple tracks of digital audio to picture using Macintosh hardware, took the audio post market by storm.
ProTools is currently the work station of choice for many high-end audio post professionals around the world. Work stations built around the PC platform have also made great inroads, and systems based on proprietary hardware and software continue to be used to post audio as well. What do these systems offer to you, and what should you know about them as you prepare to post your audio?
Digital work stations combine the flexibility of computer functionality--the copy, cut and paste features that you're aware of--with digital signal processing (DSP) power that previously was found only on standalone processing devices. Much of this power comes from software plug-ins that augment the functionality of the core DAW system. Work stations can be configured in a variety of ways, and the amount of money spent on a system will determine how complex a set of tasks can be executed at one time.
Don't go to a studio that has more power than you need. Here's where a little knowledge can help save you a lot of money. If your project has a stereo mix of location sound, and all you need to add is some stock music and end up with a stereo master, why pay for a house that has three or four rooms with expensive DAWs? Cheaper facilities can be found with operators who can execute a four-track mix and buss to a stereo master at a fraction of the price. So what if you have to work in an apartment or basement studio?
You don't need a prestigious location to get good, quality work, and often the lower overhead of a project studio can mean bigger savings to you in the long run.
If, on the other hand, you need to record foley (recording footsteps against picture, for example), replace unusable dialogue with clean speech recorded after the fact (automatic dialogue replacement, or ADR), create effects that involve extensive sound design and add music, you're better off going to a larger facility.
What should you expect in terms of digital power when you're paying several hundred dollars or more an hour?
To start, at least 16 tracks of digital recording, more likely 32. Remember, the DAW acts like a tape recorder. Your location audio is digitally transferred to two or more of these tracks, and subsequent tracks are built around them. Who will you be working with, and what exactly does he or she do?
More often than not the person assigned to help you with audio will carry the title sound designer. You don't need a degree to hold this title. What does a sound designer do, and how much of what can be done do you really need?
Designing Sound
Once again, a little preparation can save you a lot of money. Don't rely on a sound designer to tell you what you need to do with your tracks. Sound designers can take bits of sound from a variety of sources and create wonderful effects, but you might not need all of their creative skills. They like to experiment. It's fun to add robotic DSP to the voice of that mean old man down the street in your film, but did you really want anything other than the level of this character's main monologue pulled up? Be careful.
A good DAW operator can clean up poorly recorded audio to a remarkable extent. Let's say you didn't notice that an air conditioning unit was humming away in the background while you shot an interview. Some software, like DigiDesign's DINR plug-in, let you "teach" the system the contour of the unwanted sound and reduce it to a nearly inaudible level. After-the-fact remedies are never a substitute for careful field recording, but removing unwanted noise is an area in which sound designers specialize. If your tracks can not be cleaned up with the noise-removing software that comes with a system like ProTools, you may have to go to a very expensive work station like Sonic Solutions. That will add additional dollars to your budget. Clean audio captured in the field should make this step unnecessary.
Are you incorporating music into your project? Whether you've commissioned an original score or are licensing stock music, a good sound designer can be of great assistance in this area. Cutting and pasting cues, taking bits from several musical cues and cross fading them (making smooth transitions from one to the other), even changing the pitch or duration of a cue are things that can add emotional depth to your project, and a good sound designer is well experienced in this area.
Whereas analog track recording continues to have adherents, no one questions the superiority of posting audio in the digital realm. For one thing, the fact that your original audio can be copied and pasted, processed and reprocessed with no generational loss means that you can experiment to the full extent of your budget with no degradation of your original material.
The flexibility of disk-based file management also means that you can move from any point in your audio to any other without waiting for tape to rewind or fast forward.
Your last concern involves format. Will the post house that is completing your project require a simple stereo master from the audio post house? In this case they'll probably want you to deliver an RDAT master with SMPTE. If you've executed a more complicated mix, with dialogue, effects and music broken out as separate stereo pairs, they'll require a multitrack tape from you.
These days the Tascam DA-88 is the machine of choice for many, but check before you complete your post work and save yourself a possible hassle.
Be aware that there is a move in the studio away from tape towards central storage and digital dubbers using hard drives and MO disks, due to their high quality and random access ability.
Getting the most out of your budget and maximizing your creative options during the audio post process are your two fundamental goals. Thinking one step ahead while recording in the field and during post means that you won't be caught short anywhere. Remember, there are three critical phases in your audio work. First comes location recording, next audio post, and the peak of the triangle arrives when you hand off completed audio tracks to the company you've hired to execute your final mix. Know where you are throughout the journey, and the path that your audio takes you down will be interesting and fun.
By Matt Charles
While the advantages of nonlinear editing may seem obvious to most of you, there are probably still a few "die-hards" out there utilizing tape-based editing systems that don't take advantage of the power of computer-based editing. There are probably even more of you who view the audio portion of video editing as a necessary evil, that is best taken care of as quickly and painlessly as possible. However, with the emergence of 5.1 surround sound, DVD and HDTV broadcasts, the need to ensure a better sounding finished product becomes more critical. Enter the next generation of the Digital Audio Workstation (DAW).
Although DAWs have been around for many years, the ever-increasing "bang for the buck" ratio of personal computers has elevated the level of performance of the PC- and Mac-based DAWs from a non-professional audio "sketch pad" to a serious, professional audio recording and editing tool. In the recent past, such features as non-destructive editing, snapshot recall, waveform editing and mild effects processing were common in almost all DAWs. However, the lack of timecode synchronization, latency issues and non-intuitive user interfaces limited the effectiveness of some DAWs as a professional video tool. Today, timecode sync is available on almost all DAWs, latency problems have been reduced to a point where they're not really an issue, and products like the Mackie HUI and CM Automation's Motor Mix provide a easy to use, familiar work-surface and interface.
Some of the latest DAWs derive their increased horsepower from having processor chips built onto the plug-in card itself. This design allows the plug-in card to handle the audio signal processing and frees up the computer's CPU, allowing it to focus on screen graphics and other normal computer functions. This design approach allows for a higher number of discrete channels of audio per card and ensures smoother and quicker operation of the system.
Some DAWs, like Mark of the Unicorn's 2408 systems, take this concept even further by connecting an external "Breakout Box" to the sound card via a special computer cable. By moving the input and output jacks to an external box, they facilitate the patching of audio cables and help reduce the "rat's nest" of wires often found in the back of most computers. This design makes using outboard signal processors and external A/D and D/A converters much easier, which ultimately should help improve the overall sound of the audio. Since most of these types of systems are modular, they can be "stacked" to add more discrete channels of audio, depending on the availability of bus slots on your computer.
Many DAWs, like Digidesign's ProTools system, incorporate various digital input and output jacks (Lightpipe, AES/EBU and SP/DIF) that allow for direct transfer of digital data from DAT machines, Modular Digital Multitracks, and other digital recording devices. This feature not only makes the transfer of data quicker and easier, it also bypasses the converter circuitry that can add distortion and degrade the quality of your signal.
Another contributing factor to the power of these new DAWs is the quality and availability of software "plug-ins" that offer features like time-based effects (delay, reverb, pitch shifting, flanging, chorusing, etc.) and dynamics processing (compression, EQ, limiting and noise gating). Sonic Foundry and Waves are just two manufacturers that offer software plug-ins that actually become part of the DAW by placing the buttons and menus for the effects right on the DAWs interface screen. These features allow you to make use of the processing power of the computer without having to toggle back and forth between the workstation and the effects processors, which saves you time by keeping you in your creative working environment.
Of course there are companies that sell complete "turnkey" systems that incorporate a computer, breakout boxes and software into a perfectly matched package that is dedicated solely to recording, editing and processing audio and video. Avid offers a complete editing system that utilizes Digidesign's ProTools as its DAW. This is a particularly powerful combination, because both video and audio workstations are designed from the ground up to work together and offer enhanced functionality that can only be found in integrated systems. Some features include realtime, simultaneous "scrubbing" of video and audio and 5.1 surround sound functions incorporated right into the workscreen.
Obviously, the type of DAW you buy depends on your needs and budget, but the capabilities and affordability of the current crop of workstations is unparalleled. It's hard to imagine an excuse for not taking advantage of the power and ease of use they offer for video and audio editing.
One warning: the file format used in DAWs differs from manufacturer to manufacturer. If you know that you may be going from one brand of DAW to another, make sure they either support each other's file formats or use a third intermediate format for exchange, such as Open Media Framework Interchange (OMFI), Direct File Exchange Initiative or Advanced Authoring Format (AAF).
The SMPTE 272M standard specifies the formatting method for digital audio to allow a maximum of 16 audio channels to be carried in the ancillary data space of the SMPTE 259M serial digital video standard. Further, SMPTE 299M also specifies the architecture of up to 16 audio channels that can be incorporated within the SMPTE292M standard for the high definition television serial digital interface. Embedding audio within the video signal offers tremendous advantages over traditional methods of running separate audio/video systems, particularly in broadcast facilities where limited audio breakaway is required. Utilizing embedded audio offers some very attractive benefits; simplified system design, reduced cable requirements and distribution amplifier count, a single routing system and, of course, excellent cost savings.
In the past there have been problems that the industry has been forced to accept within embedding and disembedding. One of these is switching errors. When a switch is made between two video sources that contain embedded audio data, it is difficult to resolve a clean audio transition at the receiving end. Figure 1 shows the timing relationships between PAL, NTSC, and AES. At 48 kHz, there are five AES blocks during each PAL video field and 4.170833 AES blocks for an NTSC field. Therefore if a video signal is used as a genlock source for AES signals, the frame alignment and phase relationship between audio signals is arbitrary. In this circumstance, regardless of whether a signal is embedded or not, clean audio transitions are difficult to achieve.
Some manufacturers have provided solutions to the switch error problem by including audio sample rate converters in the embedder design. This has the potential of minimizing the problem, but removes any possibility of controlling audio phase.
Then there are the multi-channel difficulties. When more than four channels are required, the normal technique has been to cascade embedders and disembedders (see figure 2). Cascading is expensive and relies on the ability of the embedder to determine if current ancillary data content exists and where to allocate new data. The more channels inserted, the more difficult it becomes to determine channel location at the receiving end. Plus, with the advent of surround sound in general usage, phase alignment becomes all but impossible with cascading. Multi-channel embedding and disembedding creates indeterminate phasing across channel groups, possibly destroying the effect of surround sound.
A few manufacturers are now offering innovative embedder and disembedder modules that solve these problems. All audio inputs are accurately timed to the house AES reference (which is locked to the master video clock), to create error-free switching if the NTSC video path lengths are the same (PAL is not a problem). Should NTSC path lengths differ, a re-framing (re-timing) ASIC in the disembedder comes to the rescue. If the AES framing is disturbed by a video switch, this ASIC will continue to provide a constant AES output, thus eliminating the possibilities of receivers losing lock and requiring a finite (and audible) recovery period.
With these new designs, cascading of modules is no longer required to support multi-channel expansion. An expansion module can add an additional 12 audio channels (for a total of 16) to these novel embedding and disembedding products (see figures 3a and 3b). Not only does this save money it also solves the previous problems of identifying channel location when utilizing cascaded modules.
All data allocation difficulties are resolved by selectable group assignment. Further, the proprietary AES re-framing on all inputs accurate sample alignment across all four groups. Transport for uncompressed multi-channel audio and surround sound mixes is now reliable with total phase control.
There are still many things that need to be considered when designing a system that utilizes audio embedding. Once audio and video are combined, audio and/or video insertion and mixing may no longer be possible without first de-embedding the audio. Even inserting a station logo on video via a master control switcher might disturb the embedded audio data, therefore limiting your ability to manipulate any signal without first routing it to the appropriate disembedding and re-embedding devices.
Embedded audio has its place; it can be a valuable cost-saver in new systems and it's a great way to distribute multi-channel. Serious consideration should be placed on how to utilize the new features and functionality provided by embedders and disembedders during system design and any product that is under consideration should be thoroughly tested for the specific application. It's probably no surprise to hear that there are good products and not-so-good ones available, however the errors that can occur are often very subtle. So please, if you decide to embed, choose carefully.
Five-point-one audio is here, and it's no joke. So what's the point of mixing in 5.1? Visceral impact. If you really want to experience audio, park yourself in the center of a 5.1 system and let it rip. You'll hear an emotional intensity that's sorely lacking in most recorded music. And as artists and engineers, that's supposed to be the point of our exercise--to touch people. In talking with producers and engineers, the one thing that was mentioned time and again was that 5.1 makes the listener a participant in the experience, not just an observer. And studios don't have to panic over equipment considerations for 5.1 mixing, because it can be done using any console from a Yamaha 03D to a Euphonix CS2000 to an AMS Neve Capricorn.
If you're set to take the 5.1 plunge, here are some quick tips:
full-range channels.
discretely to a single channel.
recording took place.
For the first time since the development of stereo, our industry is being offered a new tool with which to deliver our message. Distinct creative opportunities are opened up in the 5.1 mix platform, not only for recording and mixing, but for composing as well. The possibilities are virtually endless.
Surround Sound. Everybody wants to hear it. Everybody wants to do it. But while setting up a normal stereo studio is pretty straightforward because there's lots of experience to draw from, setting up a surround system can be quite a different challenge. Well, we're here to cut through some of the mystery, explain just how to do it and, most importantly, make it fit into almost any budget.
We're talking about six-channel "5.1" surround setup in this article (see figure 5), which means three speakers across the front (Left, Center, Right), stereo rear (or side) surround speakers, and a subwoofer (the ".1" of the system). Unless you're specifically planning on doing some work in the four-channel Dolby Pro Logic (popular in broadcast and home theater, but quickly being overtaken by 5.1) or the eight-channel (7.1) Sony SDDS film format, there's little reason to go to these formats, since there's an additional cost of equipment (encoder/decoders and speaker/amplifiers) that's not required in 5.1. So by and large, 5.1 seems to be the most popular surround configuration now and in the future so that's what we'll refer to. When we're setting up a surround system, we have to address the issues of monitors, level control, panning, outboard gear, and acoustics. Let's examine each one.
Monitors
More so than with stereo monitors, all surround monitors are not created equally, which means there's a fair amount of things to consider before installing speaker components for a system.
Types of Surround Speakers
There are three distinct types of speakers available to be used as the rear surround monitors and great care should be used in choosing the one best suited for your needs.
Direct Radiator--This is a speaker where the sound shoots directly from the front of the cabinet, as in the majority of stereo monitors (see figure 6). The advantages in using these as surround speakers is that you get a fair amount of efficiency and level, which is necessary if you're going to be sending a lot of source material to the rear speakers. In many cases, these speakers are smaller than the front speakers, but in a perfect world, these should be identical to your front speakers since you may be sending some full-range source material there.
Bipoles:--In this case, the sound emanates from the sides of the monitors (see figure. 6 again). You get the advantage of additional coverage area here, which works well for ambiance material, but not so well for source material.
Tripoles--This is a trademark held by M&K which incorporates the best of both worlds, combining direct radiators and bipoles in the same cabinet (see figure. 6).
Speakers
For anything other than jazz or classical (meaning rock, techno, R&B, etc.), I like all speakers to be identical and of the direct radiator variety. Classical and jazz come from a different place where mostly ambiance is panned to the surrounds with very little source material. The same applies to film or television sound, with only ambiance and effects in the rears. With most other music, however, the mixers I've known have liked to use the surrounds for some heavy-hitting source material--and with very good results, I might add. I had good luck the one time I used Tripoles, and this seems to be the best of both worlds, especially if you do a wide variety of music. Although I've seen some people just slap into use whatever extra speakers they had laying around in order to get a poor man's surround system, just as in stereo, it probably won't translate well to the consumer.
Center Channel
The center channel is important in that it anchors the sound and decreases the "phantom images" that we have with two-channel stereo. It's most important that the center channel be identical to the left and right front speakers in order to get smooth panning across the front. That being said, many home theater systems actually use a center speaker different from the front right and left that sit horizontally on top of the television. Theaters, however, have identical center speakers. Play it safe and make it identical.
Subwoofer
The subwoofer in a 5.1 system receives a special audio channel called an LFE (for Low Frequency Effects) channel. The LFE, as the name implies, was originally designed specifically for special effects like earthquakes and explosions in the movies and has an additional 10 dB of headroom built in. Although some of the low-frequency information from the main system can be automatically folded into the LFE, most engineers take advantage of the channel and use it for additional kick and bass information.
During playback of a 5.1 program, a special "bass management" circuit is employed to route the proper frequencies to the subwoofer. This is part of the hardware that comes after the Dolby Digital (AC-3) decoder in a consumer receiver that will do one of three things:
1. Send the LFE channel only to the subwoofer.
2. Sum the low end (from 80 Hz down) from the five main channels and send it to the subwoofer.
3. Both of the above at the same time.
There is only one box on the market designed for bass management in a professional environment: the M&K HP80LFE, which is rack mountable and retails for $300. It is possible to use a consumer receiver to do the job, but then you must put up with all of the attendant hassles that come with using a semi-pro device such as RCA connectors and lack of rack mounting. This means that you have three choices when it comes to bass management:
1. Do nothing. Just send your LFE info to the subwoofer.
2. Use the M&K box.
3. Adapt a consumer receiver.
This is a tough call because the consumer ultimately decides what signal will be sent to the subwoofer. The majority of the projects I've done have used the LFE to sub-only method with good results, but only because the M&K box was not available at the time. I'd personally get a bass-management box if only to be able to check the low end in all situations.
Panning And Level Control
The biggest problems with surround sound is controlling the level and panning, which go hand-in-hand. In stereo, when we want to change the volume, we're used to just grabbing the control room level control, but when we're dealing with six channels instead of just two, it's just not that easy. The same goes for panning, which is taken for granted in stereo but becomes far more complicated in any surround scenario. As in most aspects of life, things can be done cheaply--level and panning is no exception. I've broken this down into four financial categories.
High Priced
Buy a new console with surround panning and monitoring built-in. Although a few years ago that would have meant a film-dubbing console, there's now a proliferation of consoles on the market in a very wide variety of prices ranges (from $6,000 to more than $600,000) that come equipped with surround panning/monitoring as standard. These include consoles by Neve, SSL, Euphonix, D&R, LaFont, Otari, and more, right down to the relatively inexpensive Yamaha 03D. This is the fastest and easiest way to get surround panning and monitoring, although you've got to lay out some cash to do so.
Medium-High Priced
But let's say that you just can't afford a new console and you just want to add on a product to your existing console to give you surround panning/monitoring capabilities. Otari has a brilliant two-piece add-on called the PicMix that will give you both monitoring and panning in any of the previously mentioned surround formats. The Monitor Controller gives you multichannel monitoring in any of the popular formats as well as preset and calibrated level control for $5,225. The PicMix Panner gives you four channels of panning control (two on joysticks) for $3,600.
Low Priced
An even cheaper alternative is the TMH Panner: a true 5.1 add-on panner available for less than $500 per channel designed by the father of the format, Tom Holman.
No Priced
Okay, so you're poor or you just don't want to commit to any investment until there's a ready market for your efforts, but you still want to play around with surround. There is a poor man's way to do surround panning and monitoring utilizing the busses on your current desk, only in an unusual way. This requires an English-style split desk with the input channels on one side and the subgroups and monitor section on the other to do it well.
Here's how to do it. First, set up busses 1 and 4 to go to the front left and right speakers and busses 2 and 3 to go to the left and right rear (see figure 7). As you pan from 1 to 4 you will be panning from left to right. When you pan from 1 to 2, you'll be panning from left front to back and 4 to 3 from right front to back. Now set up an aux send to your center channel and another aux to your subwoofer (LFE). Although not perfect, this method allows you to do at least some limited surround panning. Now take the output of bus 1 into track 1 of your DA-88, aux 1 (or center channel) into track 2, bus 4 into track 3, bus 2 into track 4, bus 3 into track 5, and aux 2 into track 6 (see figure 7). This is the de facto standard track configuration (but not the only one--DTS uses LF, RF, LR, RR, C, S). Now take the six outputs of the DA-88 into the insert returns of six subgroups and the outputs of those groups into your amps/speakers. Your busses and auxes control the level to tape, and the groups control the control room level. It's complicated, but it works.
Other Gear
Mixdown Machine--The de facto standard mixdown machine is the DA-88, although any format with six channels will do. In many cases, the addition of a Rane PacRat allows for 20-bit recording (while taking up the additional two channels), and additional outboard A/D and D/A converters would be nice, depending on the budget.
Surprisingly enough, it's not necessary, or even practical to do so (due to the high cost) since you can't actually change anything on the encoder anyway, and Dolby Digital is already pretty benevolent with the mix. This is not true with Pro Logic (a four-channel playback format encoded on two channels), which does some pretty serious signal steering and absolutely requires a codec on hand through which to listen.
Surround mixing will also require a new generation of multichannel effects processors. Unfortunately, none are available yet, but they may be on the way soon. I've heard the value of such a device. In some of the surround mixes that I've done, I used three Lexicons (two PCM90s and an 80) for the five channels (utilizing a custom program for a decorrolated center channel), and the results were far deeper, wider, and much more usable in the surround format than a normal stereo reverb.
Acoustics
No discussion of surround sound setup would be complete without presenting at least a couple of acoustic considerations. Without getting too deeply into a subject that's worth a chapter all it's own, here are a couple of things to think about.
In stereo, you can have an asymmetrical room with asymmetrical diffusion (such as Live End Dead End), but in surround, diffusion must be used symmetrically. In other words, once you start to spread speakers around the room, then some traditional stereo acoustic concepts (like LEDE) might be rendered not only ineffective, but counterproductive.
Also, in surround you must keep in mind the old Inverse Square Law. Since your level changes by 6 dB (four times less) as you double the distance, it doesn't take much of a position change to change your system balance. This is why side speakers sometimes work better than rear ones because the level doesn't change as much as you move backwards.
Surround sound is a brave new world, and many of the concepts that we've lived with for so many years must now be rethought. The only way forward is to get in there and do it, make some mistakes and some discoveries, and be sure to tell the rest of us.
By Nigel Spratling
Dolby Digital (AC-3) is the chosen audio compression scheme for the new digital television transmission system. This technology allows up to 5.1 channels of discrete digital audio to be compressed into a single 640 Kbps data stream. The system is quite flexible in that it allows an AC-3 surround sound signal to be decoded as mono, stereo and Pro Logic, as well as the full 5.1 channels.
This scheme was chosen because it provides decoding flexibility as well as a very high compression ratio with excellent quality retention. However, it does have a few drawbacks to those considering production, distribution and re-transmission.
In production, the movie industry has been producing multichannel surround sound for a considerable time with movie theaters now routinely being equipped with 5.1 or 7.1 channel systems. Consumers have added Pro Logic Surround Sound systems to try to achieve a movie theater atmosphere in their homes, and recent technology has begun to make AC-3 5.1 available through laserdiscs, DVD and special amplifiers and receivers. For television broadcasters it has been a struggle to introduce stereo production and transmission, and in fact, many of the smaller stations are still only capable of transmitting a monaural audio signal. To date, almost one-half of all cable headends still do not pass whatever stereo signal they receive from broadcasters.
By FCC edict, it is now necessary for broadcasters to get on the fast track to an all digital transmission of standard definition TV, high definition TV and Dolby Digital sound. So what new problems will face designers attempting to build new systems for this application?
AC-3 is a perceptual coding scheme that breaks the original signals into spectral components and, in simple terms, realigns the audio data to maximize the use of the 'gaps' and 'imperceptible' information found in the original audio recording spectrum. This spectral 'realignment' allows for significant bit rate reduction, while retaining sufficient data to allow very high quality playback once decoded.
However, this scheme does not come without a price. AC-3 was never designed to be anything but a distribution technique. It is not meant to be decoded then re-encoded.
In broadcasting, it is normal practice for a television station to receive programming material via satellite feed, landline, or videotape. This programming material is then inserted into the station playout schedule, more often than not this received material is edited to allow station identification, announcements and local commercial insertion. If the received material has audio coded as AC-3 data, it can be decoded to allow audio editing. However, once decoded it is almost impossible to re-encode to AC-3 properly due to the spectral realignment that took place during the first encoding. Double encoding AC-3 will at best, result in poor quality audio and, at worst, no discernible audio signal at all.
Due to this fact all audio signals, prior to the final transmission, must be received as either baseband data or via some mezzanine compression scheme that does not interfere with the Dolby technique. Of course, if the signals are mono or stereo (possibly with Pro Logic coding), then they can be distributed via a single AES audio connection and delivered to the AC-3 encoder at the transmitter.
Today's Problems
If true surround sound is to be transmitted, some problems appear. They are:
channels, meaning six discrete audio channels (add a SAP or separate audio program such as a second language or description stereo track, and you're up to eight discrete audio channels).
Here's what will be required to overcome these issues:
If you're designing or considering the design of a new facility, be aware that the audio systems may need to be much more complex, and designed with much more care and upward expandability, than you had previously considered.
What you decide to do in audio post today will affect the value of your product tomorrow. You may post in mono, stereo, Pro Logic (with a left total and right total audio track), or true 5.1 surround sound depending on the project and its anticipated value in the future.
For further information on digital audio referencing and timing issues, read THE BOOK, An Engineers Guide To The Digital Transition (and its sequal THE BOOK II) freely available from ADC/NVISION (1-800-719-1900) or at www.nvision1.com.
Television audio has successfully made one major transition--from mono to two-channel stereo. That transition required changes in the broadcast infrastructure and stimulated a number of new technical developments in distribution and emission technology.
While the new digital TV systems now being put into place for emission are inherently capable of delivering Dolby Digital (AC-3) 5.1 channel audio to the home, the infrastructure required to convey multichannel audio from the post production studio to the emission transmitter is not yet in place.
Also missing is the infrastructure to carry important audio metadata. Many new technical developments are required in order to make the delivery of multichannel audio routine and practical.
The Dolby Digital coding system allows very high quality audio to be delivered to the listener at a very low bit rate. Also, a number of important user features are provided by the use of a number of elements of "metadata" (data about the audio essence), including dialog normalization, which is a form of level uniformity based on matching the level of speech across all programs, and dynamic range control, which allows the simultaneous delivery of both wide and narrow dynamic range audio with the choice made by the listener.
Fundamental to the design of the Dolby Digital system are the concepts that:
The listener can choose, for instance, whether to listen to a typical narrow dynamic range presentation, or to listen to a very wide dynamic range presentation equivalent to what would be obtained in the cinema--or something in between. It is critical to the proper usage that both multichannel audio and metadata pathways exist between the output of the post production studio and the input to the Dolby Digital encoder which feeds into the DTV transmitter.
The use of low bit rate coded audio within a new infrastructure is attractive, as it can allow much of the current equipment (VTRs, AES/EBU distribution, etc.) to carry both multichannel audio and metadata. A new architecture employing audio coding should add multichannel capability without removing any important functionality which currently exists.
Unfortunately, existing audio coding systems are simply not designed to be video friendly. In the chain of video contribution, production, post production and distribution, a number of cascades of coding will be required. Any coder employed must be capable of multiple generations of concatenation without significant audible degradation, and must have a time delay which can be managed so that A/V sync can be maintained.
In order to avoid unnecessary concatenations, it is also important to be able to perform video synchronous switching between coded audio streams without affecting A/V sync and to edit encoded audio streams on existing VTRs, both without introducing audible glitches.
Currently available audio coding technologies can achieve good performance at reasonable bit rates and with reasonable equipment cost. However, when trying to apply these existing coders to the task of increasing audio channel capacity in existing video and broadcast facilities, the problems described above become apparent.
A major problem is that there is no alignment between video and encoded audio frames. When switches or edits are performed on an A/V signal, the edit points will occur at video frame boundaries but will not, in general, occur on audio frame boundaries. This will lead to damaged audio frames, or a need to move the audio edit point relative to the video edit point in order to find a suitable edit point for encoded audio.
A search for a workable solution has led this author to the conclusion that an entirely new coding system with a number of desirable properties is required. These properties include manageable coding delays, editability on video frame boundaries, metadata carriage, and satisfactory quality when a number of generations are concatenated.
A new coding system designed specifically for use with video is available from Dolby Laboratories. First demonstrated privately at the NAB '97 convention, the system is referred to as "Dolby E." With the Dolby E coder, the audio framing is matched to the video framing, which allows synchronous and seamless switching or editing of audio and video without the introduction of gaps or A/V sync slips. All of the common video frame rates, including 30/29.97, 25, and 24/23.976, can be supported with matched Dolby E audio frame sizes.
The Dolby E coding technology is intended to provide approximately 4:1 reduction in bit rate. The reduction ratio is intentionally limited so that the quality of the audio may be kept very high even after a number of encode-decode generations. The fact that operations such as editing and switching can be performed seamlessly in the coded domain allows many coding generations to be avoided, further increasing quality.
A primary carrier for the Dolby E data will be the AES/EBU signal. The Dolby E coding will allow the two PCM audio channels to be replaced with eight encoded audio channels. A VTR PCM track pair will become capable of carrying eight independent audio channels, plus the accompanying metadata. The system is also intended to be applied on servers and satellite links.
A time delay when encoding or decoding Dolby E is unavoidable. In order to facilitate the provision of a compensating video delay, the audio encoding and decoding delay have been fixed at exactly one frame. When applied with video recording formats which incorporate frame based video encoding, it can be relatively easy to provide for equal video and audio coding delays. When applied with uncoded video, it may be necessary to provide a compensating one-frame video delay.
There are two philosophies as to where Dolby E encoders and decoders should be placed: the point of constriction versus point of use. The first is to place the coding equipment at the points where bandwidth is limited, such as around VTRs or satellite links.
The second, and perhaps preferred, philosophy is to place the coding equipment at only those points where the audio signal is created, processed, or consumed. This point of use method places encoders and decoders in studios, and not at tape machines or satellite terminals (except for lower-cost confidence monitoring decoders), and requires routing of encoded audio. The benefits are that fewer coding units may be required (a cost savings), metadata carriage is assured, routing costs are reduced and unnecessary decode/re-encode generations can be avoided.
More information on Dolby Digital and Dolby E can be found on the Dolby Laboratories TV Audio Web Page at www.dolby.com/tvaudio/.
Editors' Note: Other manufacturers, including Techniche and ADC/NVISION, provide compressed multichannel audio options as well.
[an error occurred while processing this directive]