Audiobook voice sample - Feedback welcome

Report · Jan 02, 2020

Hi all...

I've been producing audiobooks since 2018 using Adobe Audition CC 2015 with Ozone Plugins.

I've been tweaking EQ and mastering settings as I go along, I've reached a stage now where I'm feeling reasonably happy with everything and seem to be getting consistent results. That said, the one thing I could do with is a "second opinion" from some of the more seasoned folks on here.

Feel free to check out this sample: https://soundcloud.com/user-357226645/davidsweeneybear-theebonyframe-ghost-rp-straight-read/s-yusVE

I'd welcome any pointers or opinions on how I might do things better.

Many thanks! DSB

~ David Sweeney-Bear ~

Report · Jan 03, 2020

I've taken the liberty of running this past three people as a read (one of them is me) and the comments are broadly similar, but the technical comments are mine alone; I can't get on with such a low background level, and this has an impact on how the gaps are perceived. If you include the comments from the others - well, they noticed the lack of dynamics. Everything comes out at pretty much the same level, and this tends to make the delivery sound more monotonous than perhaps it really is. The silence thing and the gaps: gaps are where people assimilate what has just been said, and some of them seem to be a little too long, especially when there appears to be dead silence in them - it's almost like it's gone wrong!

Light and shade in the dynamics is important, especially if it's going to be a long read. As it is, I don't know how long I could listen to this in one go. I've been told before that my observations about this are 'not what people want these days in an audiobook', but quite frankly I don't believe this - I think they are. If somebody reads to you in real life, there's both natural background and natural dynamics in the speech. Certainly the background doesn't need to be intrusive and the dynamics don't need to leave you struggling to hear some words, but I think you've overdone it a bit here, I'm afraid. It needs to sound more 'real'.

Well, you asked...

Report · Jan 03, 2020

Hi Steve, thanks so much for your feedback... no worries - I like constructive criticism!

I must say you have a forensic ear!

Maybe I was a bit hasty in providing this particular sample since it's not from one of my produced works, rather it's a sample I was asked for. And, to boot, I happened to be asked for a "staight read" rather than characterisation, so maybe it sounds a bit monotonous for that reason.

I wonder if I might try you with another sample (no rush, whenever you have the time or inclination!) - and see if the same comments apply.

I'm very interested in the "background" issue you brought up. When I edit I often find I have to take out mouth noises, the odd backround creak or tinkle or whatever, but of course this leaves a totally blank space with a noise floor of something like -100db. I'm not sure if when you say background you mean these sorts of incidental noises should be left in or whether you mean you'd prefer to hear some general backgound hiss (i usually take out any hiss with careful noise reduction but it happens I didn't run any NR on the sample you've listened to).

see what you make of this: https://soundcloud.com/user-357226645/destinys-war-part-1-saladins-secret-audio-sample/s-EeUhM

and here it is pre-mastering: https://soundcloud.com/user-357226645/destinys1-audio-sample-pm-nr-pp/s-uUbdj

Thanks for taking the time and for passing it on for other opinions, much appreicated!

~ David Sweeney-Bear ~

Report · Jan 04, 2020

The pre-'mastering' sound is a heck of a lot easier to listen to (it's less harsh, definitely), but that complete lack of background is disturbing, and technically wrong.

The solution is quite simple; do what the rest of us do when recording anything like this, and capture the real background on its own. So what you need is a nice long recording of the sound of nobody in your recording space, using the same mic as for the narration, and run that at about -60dB in the background of the speech. After you've recorded a few minutes of it, you can loop it in Multitrack for as long as you need it.

I said technically wrong - here's what ACX have to say about it; it's a submission requirement (please note that this does not mean digital silence):

"Each file must have 0.5 to 1 second of room tone at its beginning and 1 to 5 seconds of room tone at its end.

This space is required to ensure titles are successfully encoded in the many formats made available to customers. It also gives listeners an audio cue that they have reached the beginning or end of a section."

One of the reasons that they say this is that many encoders have a lot of difficulty with absolute silence, and will produce unpredictable results, often substituting their own noise floor instead of yours. That's something that you really don't want to happen. Background from the studio with no extraneous noise is ideal, though.

Report · Jan 04, 2020

Steve, that's brilliant! Thanks so much I'll definitely implement the "background" in future recordings.

As for mastering and EQ I think it's time to go back to the drawing board and see if I can do things a bit more subtlely...

I'll be back some day soon with a new sample 🙂

~ David Sweeney-Bear ~

Report · Feb 22, 2020

Steve, this is really helpful, thanks.

Does this mean if there's a slight room tone which isn't bothersome, falls below the -60 noise floor spec, and is way low, it's fine to leave it in?

I'm recording in a studio booth on a street. So I noise-reduce slight bass rumble which is visible in the spectrogram but not audible to me. I leave the rest.

Does this sound like a reasonable tactic for room tone?

Am thinking it's better to take some out in case encoding exaggerates it further down the chain?

Thanks,
Jules

Report · Jan 04, 2020

Ok... I'm back with a couple of remixes.

I'd love to get opinions on both - which do you prefer? I used a slightly different approach for mastering on each so it would be interesting to know (I have my own preference but I won't say just yet!)

SAMPLE 1 https://soundcloud.com/user-357226645/destinys-sample-remaster-mixdown-pm-m2020/s-ws6j8

SAMPLE 2 https://soundcloud.com/user-357226645/destinys-sample-remaster-mixdown-pm-m2020sansmbc-tmc/s-IifPB

~ David Sweeney-Bear ~

Report · Jan 09, 2020

A bit of an update...

I've gone back to the drawing board a second time... I'm now using dynamics processing, EQ and a gentle maximizer in the effects rack.. but.. i've rolled back the wet/dry mix to 80% instead of 100%.

The thing about ACX specification is that on the one hand it's a -3db peak max and on the other a -23db total rms minimum - a bit of a squeeze for dynamic range although presumably this is for audio clarity at lower bit/sample rates for streaming.

I find that once I've applied the rack, the total rms comes out ok but the peak is too high. Reducing the peak of course lowers the overall rms unless I use hard limiting.

The only compromise I've found is to find the peak areas and manually lower those phrases, which is a bit fiddly and time-consuming.

I think in future if i'm going to be recording a loud phrase it might be an idea to hit the -10db pad on my interface while recording, then at least i can bring the level up during editing rather than having to lower peaks during mastering. I think that way would be more time efficient.

Anyway.. here's my last effort: https://soundcloud.com/user-357226645/destinys-war-sample-final-remaster/s-CaAhM

There's still a bit of hard limiting going on in some of the louder phrases but its the best i can do with this one... I'll take the lessons learned on into future projects.

Thanks for all the advice 🙂

~ David Sweeney-Bear ~

Report · Feb 23, 2020

You said "The thing about ACX specification is that on the one hand it's a -3db peak max and on the other a -23db total rms minimum - a bit of a squeeze for dynamic range although presumably this is for audio clarity at lower bit/sample rates for streaming."

I don't think it makes much difference to what happens at lower sample rates. What it does make a difference to though is whether people can hear all of the words while they are travelling on a noisy public transport system!

Report · Feb 23, 2020

Hi Steve, I'm a bit furher down the rabbit hole at this stage and significantly "toned down" my EQs and dynamics processing.

Here's my latest effort: https://soundcloud.com/user-357226645/alt-sample-mixdown

I made a chance discovery when I submitted a project outside of ACX. It was through Spoken Realms, and they provide a piece of software "2nd Opinion" that analyses the audio for suitable specs.

It seems the way Audition calculates average RMS is different to the method used by Spoken realms, and by extention, the one used by ACX since the Spoken realms specs are identical and they publish to ACX anyway.

Here's what Audition shows for the above sample:

and here's what 2nd opinion has to say:

For practical purposes, I work with the Total RMS in audition since that has a direct correlation with adding or subtracting db amplitude. As you can see from the stats, I'm bringing it to -25db which results in -22.07 average according to 2nd opinion - well within the tolerance for ACX.

This seems to preverve the dynamic range fairly well while keeping things nicely audible.

Previously I was bringing levels up to -22.9 Total RMS in Audition, thinking this had a direct correlation with ACX specs (they're a little vague on the point of whether they are looking for average or total RMS). Hence my comment about "a bit of a squeeze".

~ David Sweeney-Bear ~

Report · Feb 29, 2020

Update on what I said above...

I now have it confirmed that ACX use TOTAL RMS, not average.

So I'm back to mastering to -23db total rms in Audition - that's the most dynamic range possible when juxtaposed with the -3db peak limit.

Unfortunately "2nd opinion" is not going to guarantee approval by ACX's system. I can only assume they must check their uploads themselves before submitting to ACX, I don't know if other platforms have different criteria.

~ David Sweeney-Bear ~

Report · Mar 19, 2020

Mr Sweeney,

If this can help you.

I have found this tutorial on youtube explaining how to meet ACX requirements with Audacity.

https://www.youtube.com/watch?v=wnutKoBzmpA

I have followed the instructions :

1. I installed Audacity, and the two plug-ins : check ACX and RMS-Normalise,

2. Followed the three simple steps explained in the clip (The instructions are on https://wiki.audacityteam.org/wiki/Audiobook_mastering)

Effect > Filter curve... > Manage > Factory Presets > : Low roll-off for speech > OK.

Effect > RMS Normalize: Target RMS Level -20dB > OK.

Effect > Limiter: Soft Limit, 0.00, 0.00, -3.50dB, 10.00, No > OK.

3. run the plug-in : Analyze > ACX-Check.

Boom! Diagnostic : My file meet the requirements!

DONE! PERIOD!

Report · Jul 16, 2023

Hey,

Is it good to use Tools for Voice and create our Audiobook using Tools voice as given below:

https://community.adobe.com/t5/audition-discussions/using-text-to-voice-to-create-audiobook

I will be very grateful to you.

Thanks in advance

Report · Jul 16, 2023

No, like it says in that thread, it isn't.

Audiobook voice sample - Feedback welcome

1 Correct answer