How We Hear
Take a look at what the keyboard shortcuts were at the end of our proceeding section this is the letter a select the arrow tool be selects the single blade track tool and if you type shift, believe me, which is what I was just blanking on at that moment it cuts all clips with the blade tool, so be cuts a single clip and shift be cuts all clips t selects the trimming tool, comma or period moves the selected at that point, the selected clip left one frame with the comma or right one frame with period shift comma shift period moves the selected edit point or clip ten frames to the left or the right command be command to be cuts all selected clips at the play head or if you have the skimmer active, which is why I hate having to skim a timeline, it cuts everything at the skimmer position and something you end up with sliced clips. You didn't know what you were doing control d changes the clip duration and shift x moves the selected edit point to the position of the play head moves the selec...
ted at a point now there's someone that I don't have here, which is a command g, which turns a connected clip into connected storyline what I want to talk about in this session is I won't explain how we hear talk about some audio basics will use the audio meters, adjust audio levels and pan used key frames to create audio animation, work with multi channel audio, discuss rolls and work with dual system sound I think I've got this ninety minutes section down to about four and a half hours so I think to get it all to fit inside ninety minutes I'm just going to drop every other word it's easy to say and hard to do anyway will try and squeeze as much in this we can that's why questions are so important let's define a few terms first mono is a single channel audio that place back equally on both left and right speakers giving the illusion the soundest centered between them amano sound does not come out one speaker it's the same sound coming out both speakers the left speaker and the right speaker which means that when the sound is equal in volume coming up both speakers it gives the illusion of coming from the center between the two mano is a really, really important concept for audio mixing we'll see that more in a minute stereo is two channels of audio one for the left hand speaker and one for the right hand speaker that gives the illusion of sounds moving between the two speakers car panning from left to right or from right to left stereo is the default audio output for final cut pro tem surround I was having a conversation with some audio people last night for dinner and surround is now up to ten point two channels, but normally we think of surround us five point one, six channels of audio playing on speakers that surround the listener horizontally, giving the illusion. The sounds are moving in a three hundred and sixty degree horizontal circle around the listener. Multi channel is a special form of mono audio. Ah, multi channel audio clip contains more than one channel of audio. The most common number of channels is, too, and this leads to what's called dual channel mano audio. This is probably the most important audio for most of us to use when we're recording our projects. Dual channel monitor is a clip where one sound like an interviewer is on one channel on a second sound, like a guest is on a second channel. The two channels are in sync, but they are not related. For instance, just compare jim to myself. If I was doing an interview with jim, finding out what it's like to be a world class art director, I would have gyms, audio on one channel, and I would have my audio on the other channel that way, I'm able to edit jim's audio and adjust his levels from his very soft and dulcet tones into my very loud tones because I just have learned to speak louder so I want to have separate level control over my volume than over jim's volume I can't do that is a stereo pair I can do that as dual channel mano we'll talk more about that as we go farther into this session remember that with dual channel mano each audio sources monaural means it comes equally out the left and right speaker giving the appearance of being in the center and it's recorded two separate tracks recording interviews or recording each actor to its own audio channel makes a great deal of sense because it simplifies editing so at great personal expense we have added a very high quality demonstration segment because this is the kind of show that needs no stops withheld I want to talk about how we here let's go over the white board imagine if you will that I can draw uh this is going to be so much fun for me and so much pain for you human hearing is considered a range of sounds that goes from roughly twenty cycles per second at the low end twenty thousand cycles per second at the high end twenty thousand cycles per second is such a high pitch it sounds mohr like wind going through the pine trees where twenty cycles per second is such a deep pitch that it feels more like a vibration than it does a ah specific tone the lowest note on the piano all the way to the left when you push that deep bass key is twenty seven and a half cycles per second, and the highest note on the piano the highest to the right is four thousand one hundred eighty six cycles per second. So the range of a piano starts pretty close to here, but not all that way down and ends about in the middle everything that we hear, whether it's music or speech or sound or noise, whatever it happens to be. Every sound we hear is essentially a range of rick since he's from twenty cycles per second to twenty thousand cycles per second, and that assumes that you're eighteen years old. A three year old here's far below and far above that in a sixty year old is restricted in terms of what they can hear, because as we grow older, our ability to hear that wide a frequency range diminishes. Well, frequencies are essentially pressure waves that looks something like this. We have a wave that flows through the air from whatever source originated the sound to art your drums. In fact, the way that sound is created, wait, take a big lung full of air, and the muscles in our chest compressed the air and force it out in short. Blasts across our vocal chords which causes our vocal cords vibrate setting up a pressure wave which flows through the air till that slams up against decide our ear drum causing the eardrum toe vibrate now there's nerves inside that ear drum that sense the vibration and convert the vibration of the year drums into electrical signals that nose neuron is those electrical pipes go from our ear drum into the brain and about a year and a half ago we finally discovered the part of the brain that actually turns those electrical signals into sound senses that our brain interprets is sound whether it's noise or music or speech what's interesting is the bundle of nerves that goes from the ear drum to the brain is composed of neurons and those neurons or biochemical job ese and the on ly fire up to five hundred times a second to this day we have no idea how we hear frequencies higher than five hundred cycles per second because biochemically the neurons don't fire any fashion than that, but clearly we hear far beyond five hundred cycles per second in tow two thousand cycle's twenty thousand cycles per second and we're still not exactly sure how that actually works so we now know that we're dealing with a pressure wave well let's take a closer look at this pressure wave because as jim so beautifully did last yesterday, he pointed out the fact that computers don't have ears so we got to find some way of converting a pressure wave which looks like this into something the computer can hear now this is a pure sine wave and the actual sounds that we hear are much more your regular than that but if we can explain it for sine wave we can explain that for everything else here is the problem this is what's called a smooth down a long curve there's an infinite variation and changes move from on area of high pressure either positive high pressure or negative high pressure or an area of zero pressure and engineers love to measure stuff so they measure this isa siri's of voltage is where the maximum pressure is plus one vote the minimum pressure is minus one volt in that line across the centers called zero crossing line zero crossing linus where there is frequency information but no volume information there's essentially this is dead quiet positive and negative and it doesn't make any difference if a sound is positive or negative it's equally loud regardless of which side of that zero crossing a line it's on well the computer hates smooth curves because the computer like storing numbers in binary form oh jim, wake up! This is the really important exciting stuff do you know that there are ten types of people in the world? The first is those who understand binary and the second is those who don't I will wait. I'm not counting that up to ten, please. Okay, so we have zero. One, two, three, four, five, six, seven, eight, nine everything inside the computer is stored as a one or a zero. So when we count to ten that's actually two zero. One, two, three, four, five because all we have the store inside the computer are ones and zeros. So when we're storing information inside the computer, we have to reduce that information. Binary numbers there's a bucket there's, not a one and a half there's, not a ten and three quarters. I've got these individual buckets, which store stuff, and what those buckets mean is is that I can't have a smooth curve. So what we did from a computer point of view is that we took this perfectly smooth curve, and we sliced it into time. Slices called samples, and we measured the average voltage across each one of those samples. And then we store the average voltage in the computer as a single number that says, for that duration of time, the average voltage, the amount of pressure is that number, and we end up with a stair step look of our audio now, assuming that I can draw, which is not a safe assumption, this stair step. Starts to approximate the shape of that curve so we can get it to beam or accurate by making the samples closer and closer and closer together. So we end up with this pointillist drawing where each sample is like a dot, and I have the dot again, assuming that I could draw, I could draw a serious of dots, which would be close to, but not exactly the same shape as that analog surface. Now, as I know you remember from high school physics, the nyquist serum states that if I take the sample rate and I divided by two, that equals the frequency response you you do remember the nyquist serum from high school physics? Yes, nyquist theorem and y q u esti. Well, the nyquist theorem states that if you take the sample rate divide by two, that equals the frequency response, which gets too this interesting point. If we record audio at a sample right of forty eight thousand samples per second, and we divide that by two that yields off frequency response of twenty four a thousand cycles per second, which exceeds human hearing. So the reason that we're recording audio for video at forty eight thousand samples per second is it gives us a frequency response that exceeds human hearing, which means that everything we record we can hear. So now you're wondering why we picked a sample rate of forty eight thousand that's because it represents the full range of human hearing plus just a little bit extra just to be safe so sample rate controls and determines the frequency response of the audio that we hear audio however, is much more complex than that audio is not linear audio is log rhythmic what that means is it's not a straight line it's a hockey stick if we go back to frequency remember the frequency response that we have is what's twenty thousand twenty cycles to twenty thousand cycles and for those of you that studied music every time the frequency doubles we go up in octave so if I started twenty cycles per second and go up to forty, that doubling of frequencies increases the pitch by an octave from forty to eighty eighty to one sixty one sixty two three twenty three twenty two six forty rounding slightly to twelve fifty twenty five hundred to five thousand to ten thousand twenty thousand. There is as much difference in pitch in these ten thousand cycles up here from ten thousand twenty thousand as there is in those twenty cycles down there from twenty cycles to forty cycles human hearing is a ten active range and human speech is roughly in the middle of it from about two hundred cycles to about seven thousand cycles and there's a little bit of grounding and there there's about two and a half octaves of sound below the deepest human voice and there's about an octave and a half of sound above the human voice guys voices air on the lower side. Girls voices air on the higher side. The reason I mentioned this is that human speech itself is divided into two categories. Vowels and constants. Vowels are all low frequency sounds. They provide the voice it's, warmth, that's richness, its sexiness, it's identify ability. The low sounds a e I o and u are what make the voice sound unique to ju mar teau ad or to philip or myself or to you. But that doesn't provide clarity. Clarity has provided through diction. Diction is provided through continents and continents are all high frequency sounds. T and p and k jim, I'm going to embarrass you for just a second, but I need you to do an experiment. Okay, sure. Okay, I want you to say the letter l afis and frank f and I'll say the letter s and sam sam okay, bonds between a mess sf sf sf sf now the difference because both of those africa, tibbs and both of those reforms by having air whistle across the tip of your tongue on the roof of your mouth there's a hissing sound with the letter s and there's no hissing sound with the letter f they are exactly the same letter, they're formed exactly the same way if the hiss is there it's an s if the hiss is not there it's an f and that frequency of the us being able to perceive it or not, it's roughly sixty one hundred cycles for a guy and roughly eight thousand cycles for a girl. So if you're talking to me on the phone and you say let's, go meet it. S st if I can't hear the hiss that I don't know if you're talking about street, as in frank rs streets and sam and the problem is the telephone doesn't pass frequencies as high as six thousand cycles, the phone stops thirty five hundred. So even if you had perfect hearing, you'd be unable to tell the difference between the letter f isn't frank and ss and sam, because the actual frequencies required to distinguish that particular letter are missing because the telephone doesn't carry that frequency. The reason this is important is let's, say, and ed will pick on you for just a second. Let's say that you're doing an audio mix for, uh, program for sesame street. If you're mixing for three year olds they've got the hearing of bats you can put any kind of sound there you want they'll be able to hear it but if you're mixing for sixty year olds you want to boost the high frequencies to make sure that those those those high frequency sounds the continents are clearly understandable so that people that are actually listening to your program understand what's being said not simply hear that somebody is talking but be ableto understand what they're saying that ability to emphasize the high frequencies makes it possible for adults to be able to clearly hear your program so being able to distinguish low frequencies which give a voice its richness and its warmth versus high frequencies which give a voice its clarity is critical okay, well we've seen that frequencies are log rhythmic but equally important is volume and if you remember only one thing from this whiteboard speech I need you to remember this next statement and that is that audio levels must not exceed zero d b not once not ever not for a little bit not for a fraction of a second not because it sounds really good not because you want to not once not ever not at all period there are three fireable offenses for an editor. Audio levels that exceed zero is a fireable offense. Number one light levels that exceed one hundred percent white levels is is offense number two and chroma levels that exceed over saturate is offense. Number three. We're going to talk about audio today we're going to talk about white levels and chroma oversaturation tomorrow, there's no excuse for audio, which exceeds zero d b, and the reason is remember we talked before that that binary that the computers store information as binary numbers there's a fixed range of numbers that we can store audio volume two when those range of numbers is filled. When your volume exceeds zero there's no buckets to store the information she gets thrown out the back of your computer it's little bit feet falling, kicking in the air, dying slowly on the carpet while the audio sounds crackly and pop lian and distorted an awful enters not a technology on the planet that can fix it. Audio levels must never exceed zero d b, but audio levels like audio frequencies are also log rhythmic. When my level hits zero d b, my audio games that one hundred percent every time I drop the audio gained by six d b, my volume is cut in half. Negative six d b is a fifty percent game negative twelve d b is that twenty five percent game eighteen d b is twelve point five percent game every time might gain drop six d be the perceived audio volume is cut in half, so I wanna have my audio be as close to one hundred percent as possible and yet never go over one hundred percent, which gets to the last point that I want to make before we go back into the software. When we're recording people on set not for alive, but when we're recording people on set on audio engineer will always record audio a little soft because they don't want to make the don't want to run the risk that audio that's recorded on set is distorted, meaning that the audio can't be used in production, which would require that audio engineer to be fired. They never work again. Employment is a good thing, so audio on set is always recorded around minus twelve minus eighteen d b when the actors are talking normally, but in post we want to boost it up because negative twelve, the negative eighteen leaves way too much game on the table, we're not able to take advantage of the full power of the human voice music, on the other hand, has recorded entirely differently music is recorded as close to zero as possible. Most heavy metal music is mastered. A zero most loud rock music is between zero and one tenth of a d be below zero, and most accused of acoustic music is bouncing around negative six, the negative three it's all loud, which means that we've gotta pull the gain of the music down and pull the game of the actors up, which is the whole process of adjusting audio inside our video editing application. I need to make our actors louder need to make the music softer, and how we do that is what we work with software to do. So let's, go back to the computer. Is that not cool stuff? I mean you now you know what, what samples are samples, simply determined frequency response. This has a direct implication in terms ofthe when we do video compression on audio compression cause. By playing with the sample rate, we can reduce the file size without materially affecting equality.