Normalizing the audio

Report · Nov 22, 2017

I'm putting together my film's audio including the camera audio and lav mics and have read and watched a lot of tutorials of which many recommend for example normalizing the audio for all peaks to -3 db and then some say for tv the audio peak should be -12 db and some say - 24 db so go figure ?

2. Anyway, my common sense would tell me not to normalize the audio to for example - 3 db or some other fixed number because that would mean that a normal quiet dialogue speech would be peaking at the same loudness to the listener than somebody shouting which would not make sense, correct ? So if that's the case would the solution be to have to go over each audio clip manually and make sure for example normal talk dialogue would peak out consistently for example in the -10 db range and loud scenes to max out peak out in the - 3db range ?

3. I recorded the actors' audio with xlr mic lavs stereo not mono to a tascam audio recorder but for some reason on the Adobe Audition timeline it comes out only from one speaker which i can fix quite easily by the effect Fill left from right, or actually perhaps even better just right clicking and then clicking Audio Channels and putting a checkmark on both Left and Right checkbox vertically on top of each other. I guess that's acceptable as well ?

Report · Nov 22, 2017

In strict terms, particularly in Audition, Normalisation does not make all your peaks the same level. Provided all your audio is in one file then Normalisation to -3dB will only amplify the audio to make the loudest peaks equal -3dB. It won't squash up lesser peaks to be the same level. Normalisation measures the highest peak within a particular audio file and adjusts the gain of all the audio by the same amount to bring just that peak up to the selected level.

If the audio that you are editing in Audition is recorded to separate takes, some of which are loud dialogue and some are quieter, then you certainly don't want to individually Normalise all the files to a fixed level because, as you rightly say, you will lose the dynamics of the performances. So providing you are recording at a fixed level and you don't alter the recording levels between scenes then you can just apply a fixed Amplification level to all the audio files to bring the highest peaks up to around -3dB.

After you have finished editing your video you may find that there is too much level difference between the quiet and loud parts. If under normal listening conditions you can't hear some of the dialogue under your added effects or music or if the loudest bits are too loud then you can apply some Compression to the whole Dialogue track to slightly reduce the dynamic range.

Report · Nov 22, 2017

This is actually quite a complicated issue. I think that the place to come at it from is to assume that viewers don't want to have to keep adjusting the volume control whilst they're watching a programme. Where it gets more complicated is that, obviously, not all dialogue comes in at the same level, and also some speakers manage to use a much greater dynamic range than others - so the soft parts of their speech are much harder to hear.

This is where compression comes into play, especially with mixed sources. It's perfectly possible to reduce the peaks of spoken dialogue (aka 'limiting') by anything up to 9dB without anybody seriously noticing, and that on its own can make level setting a whole heap easier, sometimes obviating the need for any other compression at all.

As far as levels are concerned, there's a difference between the level you make the original recording at, and the level you process it at. Normally you'd make the original recording with peaks up to around -12dB, just to give you enough 'headroom' if somebody shouts, or whatever. But you wouldn't process it at this level - that's far too low.

All compressors only work at their correct setting levels when they're presented with signals that peak at 0dB, so the first thing you do with any dialogue recorded at -12dB is normalize it to just under 0dB, and then attend to whatever you're going to do with it (like run it through Audition's Dynamics Processing). Rather than me going through all of the Dynamics Processing options, I can point you at a video which explains all this pretty well - Dynamics processing - YouTube

As far as the mix itself is concerned, almost certainly you want all these sources as mono anyway, and you just pan them where you need to in the stereo field.

Report · Nov 22, 2017

Thanks for your answers. So regarding normalizing, I would normalize all clip peaks to about -3 db for example but then i would also have to run it through Dynamic processing if needed. -3 db peaks sound quite much , i thought peaks of about -10 or -12 is more what the tv channels etc want ?

Report · Nov 22, 2017

They don't measure it like that - they use LUFS, and the actual requirements vary a bit. But no, you wouldn't want dialogue down at that level, in any event. If you google LUFS or LKFS, you'll find loads of stuff about it. It is a subject that confuses many an editor...

Report · Nov 22, 2017

they talk about 1 luf equaling 1 db and -23 luf to be the broadcast standard so the fast conclusion would think -23 db but i guess there are many other factors that go into it, and easier shortcut would be to just normalize for All peaks for the clips at -3 db ? Or how about listening to a movie from Amazon and comparing the levels which would give me about equal levels after normalizing to All peaks to about -10 db based on what i hear ? I do understand that different buyers have different audio requirements but certain benchmarks for this should exist

Report · Nov 22, 2017

julianm44443758 wrote
they talk about 1 luf equaling 1 db and -23 luf to be the broadcast standard so the fast conclusion would think -23 db but i guess there are many other factors that go into it, and easier shortcut would be to just normalize for All peaks for the clips at -3 db ? Or how about listening to a movie from Amazon and comparing the levels which would give me about equal levels after normalizing to All peaks to about -10 db based on what i hear ? I do understand that different buyers have different audio requirements but certain benchmarks for this should exist

I wish it was as simple as that; unfortunately it isn't. For a start, LUFS are a time-based measurement based on a loudness estimate (like Leq) and don't directly relate to dB directly at all - whoever said that doesn't understand the concept., although a LUFS measurement will vary according to the overall level, yes - by about 1 LUF per dB, and that's how you 'cheat' the system. You can still have a programme running up to 0dB with an overall LUFS figure much lower than that, though!

Ultimately, if your programme sounds balanced, and you don't find that you have either to keep turning it down, or up whilst listening to it, it won't be far out anyway, and should only need small adjustments to hit this somewhat arbitrary -23 LUFS figure. Audition will help you with this, as it can measure the overall result and tell you directly what it is.

Report · Nov 22, 2017

Thanks, I look into this and into your video more. I realized that some clips with similar dialogue sound much louder than others although the max peak is set on both clips to the same -12 for example. Btw I just normalized All peaks to -12 db for a clip by right clicking the audio gain etc but after normalizing the clip's All peaks to -12 db and reopening by right clicking Audio Gain it says int eh bottom of the Audio Gain box below Normalize All Peaks : Peak Amplitude -7.5 db instead of -12 db ?

Second question : in order to have the audio come from both speakers can i just right click on Audio channels and put a checkmark on the boxes right on top of each other on L and R ? It seems to work.

Report · Nov 22, 2017

here's the stages of audio processing from ITU-R BS.1770-2

https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-2-201103-S%21%21PDF-E.pdf

-----------------------------

So, finally,

-23 LUFS broadcast

-14 LUFS streaming like spotify, youtube etc.

'DialNorm' got replaced by BS.1770-2, thank goodness.

btw, If you want to try another technique, try RMS. It's super useful for pre-editing because RMS takes into consideration all loudness data in the waveform, not just a section. First, apply it to get all audio same volume, then do audio editing. Once happy with audio editing, apply LUFS for 'standard'

Report · Nov 22, 2017

Thank you. Although we have mic lav audio as well but sometimes we need to use a voice recorded by camera audio, which would be ok except of the relatively loud sound of air circulating. I've tried different effects dehummer etc including also the Noise removal but as it removes the air /hum sound it makes the voice sound robotic. Is there any solution ?

Report · Nov 23, 2017

julianm44443758 wrote
Thank you. Although we have mic lav audio as well but sometimes we need to use a voice recorded by camera audio, which would be ok except of the relatively loud sound of air circulating. I've tried different effects dehummer etc including also the Noise removal but as it removes the air /hum sound it makes the voice sound robotic. Is there any solution ?

If you can get a clean sample of the noise, and use it as a noise-print, then the best approach is to use the process Noise Reduction tool, but in several passes, each pass with a different FFT setting (which means re-sampling the noise, unfortunately) and only taking a few dB off at each pass. If it's air noise, you'll probably get a better result with higher FFT settings anyway. The whole process takes longer with multiple passes admittedly, but it's always worked better this way.

Report · Nov 23, 2017

Ultimately it is ones ears that decide about how loud something sounds to you. As an audio/video editor you need to have a decent monitoring system in your studio/edit suite and have a calibrated volume level for the speaker control that you always come back to. With experience monitoring at that set level will tell you more about the 'loudness' of your dialogue more than any software or metering system IMHO.

Report · Nov 23, 2017

rx 6 advanced uses artificial intelligence to remove noise.

Report · Nov 23, 2017

chrisw44157881 wrote
rx 6 advanced uses artificial intelligence to remove noise.

Actually it uses more or less the same process that Audition does, but it has better system management of it - for instance, making multi-FFT passes much easier to implement, and it also lets you choose what level of processing you want to do. And, it's quite expensive...

Report · Nov 23, 2017

1. I read some articles about movie dialogue loudness dbfs peaks to be around -10 to -12 db which seems to equal my experience listening my audio to the movie trailers online, roughly. I'm on a timeline for this indie film so I don't have the luxury to go into too much depth about the nuances etc. unfortunately. However if I would normalize the All peak levels under Audio gain to just under 0db instead of -11 for example , then that would seem to contradict the advice to normalize to about -11 db ? I understand that dynamic processing will level out the overall volume.

2. I assume the the fast way to do this would be to just listen to audio and if needed adjust the Db peaks much lower if the audio overall sounds too loud if there is no time for compressing or mixing ?

3. My mic lav recorded audio properties state :

Source Audio Format: 44100 Hz - 16 bit - Mono

Project Audio Format: 44100 Hz - 32 bit floating point - Mono

I assume they are fine? Thanks.

Report · Nov 24, 2017

julianm44443758 wrote
1. I read some articles about movie dialogue loudness dbfs peaks to be around -10 to -12 db which seems to equal my experience listening my audio to the movie trailers online, roughly. I'm on a timeline for this indie film so I don't have the luxury to go into too much depth about the nuances etc. unfortunately. However if I would normalize the All peak levels under Audio gain to just under 0db instead of -11 for example , then that would seem to contradict the advice to normalize to about -11 db ? I understand that dynamic processing will level out the overall volume.

Yes, I've read some articles saying this too. Thing is, they don't tell you that you can't actually stipulate dialogue levels just like that - they depend entirely upon what else is in the background. For instance, if there's any sort of music playing in the background, then you almost want the dialogue levels to be 10-12dB higher than the music, otherwise a lot of people don't hear them properly (especially when they get older - look up the 'cocktail' effect). The other thing that they don't take account of is where things are panned in the soundfield. You'll get away with lower levels of dialogue if all of it's in the centre, and everything else is panned to one side or the other.

But quite frankly, I wouldn't trust any feature film sound editor to get this right; I've heard all sorts of appalling film mixes that I would have sent straight back - and mainly because they'd screwed up the dialogue levels. Ultimately, the best thing to do is to get your mix so that all of the parts of it sound balanced with each other, and don't leave you leaping for the volume control (and get somebody else to listen to it too, if you can), and then just normalize the whole shebang to about -2dB. Whatever happens, that won't be far out.

Report · Nov 24, 2017

A slight aside from the levels discussion. I notice that you specify your audio files as being 44.1k sample rate. For normal video/film work they should really be at 48k.

Report · Nov 24, 2017

thanks so I would need to somehow to change the rate of all my mic lav audios etc from 44.1 Hz to 48 Hz ?

Report · Nov 24, 2017

Any problem will only arise when you come to try and add your final edited mixed audio file to your video. Yes you should really start off all your recording at 48k if intended for video. However you can leave it as it is for now and only convert your final mixed down audio file to 48k which can be done as a Save As from within Audition.

Report · Nov 24, 2017

I was listening to different movie trailers from my computer's speakers and realized that one of the loudest and best audio is from the Wonder Woman trailer and that dialogue audio db would equal more of -2 instead of -10 based on my ear hearing. The voices in the beginning dialogues of that trailer sound really good, smooth and pleasant, i'm wondering what they do differently from others?

Report · Nov 24, 2017

julianm44443758 wrote
The voices in the beginning dialogues of that trailer sound really good, smooth and pleasant, i'm wondering what they do differently from others?

That will be a good recording in the first place, and then limiting, compression and possibly gating to get rid of breath noise. Oh, and a fair bit of experimenting to get it right (which is the clue to all of this...)

Report · Nov 25, 2017

This is the order I do audio film work. I put effects in order of non destructive.

1. dc offset - offset power errors affects dynamic range so this goes first, a critical pre-alignment step

2. match volume - normalize 'peak amplitude' everything to -6db to see serious errors easier like clipping, offsets, phasing etc.

3. declip - run delip effect, remove buzzing with dynamics processing, notch filter the rest.

4. equalize - run parametric equalizer to reduce wind, hiss via freq cut, give vocal clarity 'D' curve

5. phase correct - once tracks layed, detect any phasing errors

6. noise removal - declick, de-reverb, remove hiss, wind, learn sound model/noise reduction, all else healing brush

7. studio reverb - to add vocal weight

8. RMS -23db audio - to roughly match vocals begin basic audio editing with levels(vocals center and 12db higher than music, etc.) while keeping dynamic range

9. compression - multiband compress sfx: 4:1, vocals 2:1 db to give pleasing db range. always 2nd to last step because any other processes affect dynamic range. It should soft limit for you. hard limits are audible and not recommended.

10. quality control - match loudness ITU-R BS.1770-3 -23 LUFS film, -16LUFS youtube - for standard

Report · Nov 25, 2017

Interesting. I'm going to look into these.

Report · Nov 25, 2017

peak amplitude is non-destructive in a 32bit environment and retains dynamic range. It just makes noise easier to see and work on. and matching mics would be step 8b, 'matching shots', not listed.

For example, I use RMS and tweak the mixer to decide if something needs compression/limiting. i.e. too close to 0db. -23 RMS works extremely well for audio editing because it keeps the voices in a 'talking range' so you don't spend years micro-managing each clip. and natural voices will usually peak at -10 so it actually creates a perfect setup for whispering, normal talking, and yelling, all while dynamic range preserved.

Usually if something goes from -30 to 0db, its going to be compressed 4:1 because human ears get tired after a few hours in a theatre from a larger dynamic range.

Report · Nov 25, 2017

dynamic processing effect works well to remove low db noise as your 'ducking limit'.

crossfades would only be neccessary if there was still noise in the clips. Once reverb was matched, either by dereverb or reverb, then rx ozone equalizer to match eq across clips or manually in audition with view freq analysis para eq effect.

Add room tone manually via dynamic proc or rx 6 advanced ambiance to match automatically. If noise is completely removed, reverb matched, and eq matched, then a simple RMS -23db should make all your clips somewhat match across the whole timeline. finally lay in the room tone track.