Copy link to clipboard
Copied
I've been producing audiobooks for Audible for 3+ years now.
Mostly, I don't get a chance to listen back to what customers are hearing through the audible app, but just recently I had to check a production to make sure a change had gone live.
I was quite shocked by the audio quality - or rather lack of!
I already knew that audio is down-sampled by Audible for streaming and was given to understand this was 64kbps @22050 Hz . I always check my finished audio at this rate to make sure it sounds OK.
Here's an example: https://drive.google.com/file/d/1pjFmCfS4234eijvzMFsWveaPQmKSt18w/view?usp=sharing
Here's the same passage, stream recorded from Audible's web-based app: https://drive.google.com/file/d/1k5ZmZ_4_DMS_MOtDlwHoztg3WEGjPLNP/view?usp=sharing
My main concern are the high-frequency artifacts in the audible stream... I'm not quite sure how to describe them but it's like a gated hiss that corresponds to the speech.
I'm fairly sure this is a result of audio compression codecs used for the audible stream. From what little I know, I think Audible uses their own .aa format for streaming.
Here's where it gets interesting...
It seems not all streams are "equal" as far as Audible are concerned.
Here's a sample of another narrator/producer who I would loosely class as a contemporary of mine, again recorded from the Audible stream: https://drive.google.com/file/d/1JFPyIPGfZa48NUPwQLk5YrwbHZxWC2PF/view?usp=sharing
- similar quality/artifacts
Here's John Banks reading from "The Time Machine" by H G Wells, which you might say is more "premium" content from Audible's point of view: https://drive.google.com/file/d/1fAkhMOMbWbXOsf_18BjJFqYv9cUTw0YZ/view?usp=sharing
-none of the high-frequency artifacts, but still clearly downsampled (quite close to my first sample).
And finally, Here's a sample for the audio stream of Stephen Fry's "Victorian Secrets" series - "front-page" content on Audible: https://drive.google.com/file/d/1ueak4dpILjpM-pYqoKeiB2PF1kgzPUFp/view?usp=sharing
-crystal clear! (incidentally, this content took longer to buffer than the other examples)
So... it looks like I'm stuck with the lowest-quality stream for my content on Audible, which leads me to wonder if there's anything I can do technically within Audition when mastering/encoding the files in order to maybe reduce if not eliminate those nasty high-frequency artifacts?
My files are submitted at 44100 16bit 192kbps mp3s as per Audible/ACX specification. They happen also to be stereo due to my occasional use of sound effects and/or music, as well as the mastering plugins I use sounding nicer in stereo anyway.
Would appreiate any advice!
Copy link to clipboard
Copied
just thought of a description of the unwanted artifact... it's like "static" from a radio signal!
Copy link to clipboard
Copied
This is what I think might be going on...
It's difficult with MP3 files to get a clue, but there is one. And that's the noise floor. In your example, it's at about -48dB and in the Stephen Fry example, the only bit I could find was at a much lower level, at about -78dB.
And indeed in your original sample, the noise floor is higher. This on its own may well account for the static-like sounds - the noise floor is getting much closer to 'clearly audible'. I did wonder whether they were resampling, but I don't think so - that would introduce other artifacts and generally make things a lot worse.
So my guess is that you need to rearrange your signal processing so that the resultant noise floor is more like the Stephen Fry example, whilst remaining within the ACX gamut. The fact that they are stereo files should make no difference, incidentally. In order not to introduce other problems, you might need to be a bit subtle about this - you don't want to downward-expand the noise floor within the dynamic range of your audio, because that will make everything appear to be disappearing into a bottomless pit. But expanding out the bottom part of your signal from the level of the quietest point down to a similar level as the Stephen Fry example might go a long way. The ultimate noise floor of a 16-bit file is much lower - at about -93dB by the time you've taken dithering into account, but that is far too low a value to aim for with speech; you need to preserve a natural-sounding environment.
Incidentally I don't think that meeting the ACX requirements will be affected too much by where the actual noise floor is - although I'm also slightly surprised that they've let your files through - the stipulation according to them is
and your noise floor is higher than that, at about -48dB, as I mentioned earlier. The Stephen Fry example looks like a good compromise to me, and should be the sort of silence level you should be looking for. At -48dB the noise will sound like a digital version of cassette tape hiss - which is pretty much what it is! And it's not helped by being an MP3 file either - just creating that will add a bit of noise - it always does. Which is why you need your file to have a SNR significantly lower before you convert it to the MP3 you're going to submit.
Copy link to clipboard
Copied
Hi Steve, thanks for that...
I'm well aware of the ACX -60 noise floor requirements - this brings up an interesting observation, something I hadn't really noticed before - downsampling seems to bring up the noise floor:
Here's a bit of the sample as it was submitted to ACX (192kbps mp3) : https://drive.google.com/file/d/1-UQVZhYXvRMWbczYjrplrxKc5PRtcKLz/view?usp=sharing
Measuring the long gap just before the end, the noise floor on this is -59.94 / -60.21 (left and right channels)
The "original" sample in my opening post was downsampled to 64kbps (22050khz), which is something I do to "get an idea" how my masters will sound streaming on Audible. The same gap measured on this file comes out at around -55, it's still 16 bit, but presumably the lower sample rate and bitrate are bringing up the noise floor?
Then, the same gap in the recorded Audible stream of my file measures -44.3
If there has been no resampling by Audible, then this would mean that some form of compression/signal boost + limiting must have been added at some point.
As it happens, I also took a recording from the stream via Audible's mobile app to compare it with their browser-based stream. Interestingly, the noise floor of the same "gap" on that comes out around -50
I have pet theory that Audible are adding more signal boost to their browser-based app because they imagine the user to be listening through computer speakers, whereas a mobile listener is more likely using headphones! Probably not true though - I think actually it's a different process on the mobile app, rather than streaming it actually starts downloading the entire chapter to the device when you tap play, then after a little while says "ready to play", while it continues downloading in the background... this is likely to be an aax format download rather than .aa stream. Rather annoying I imagine for anyone who's concerned about storage space on their device.
Anyway, the upshot of all this is (referring to my 192kbps original submission): I am within the correct specificaton as regards noise floor. I could certainly reduce the noise floor to around -70 or so and see if that helps. The trouble is, I won't have a way of checking the results of that until my next production has been published. A very long-winded process of trial and error...
At the moment, I'm trying to track down what codecs and/or processes Audible use for their streaming platforms by emailing various people at Audible/ACX. If I knew this for sure, I could possibly adjust my mastering process accordingly. However I'm not holding my breath as I wouldn't be surprised if they regard this as "sensitive" information... plus I doubt if I'll ever get them to admit that they have a heirarchy of streaming quality for different titles!