Typically, ACX don't supply anything like an adequate definition of what they want, but I think that it's what the Audition Amplitude Statistics (the key to all of this) reports as Total RMS amplitude:

In this screengrab, you'll see that it indicates a value of -29.62d\b on that file, which I normalized to -3dB, and that's what you have to do, basically - normalize your file to -3dB and scan the amplitude statistics. When it doesn't come out right, almost invariably it's because there are peaks in the signal (you can see them clearly) and these are what you need to reduce, as they've keeping the average signal much lower than it needs to be. Yes you use compression or limiting to do this, but after you've done it, it's important to normalize again (and each time you do it, in fact) otherwise the peak amplitude will be wrong. As long as you keep your file and processing at this stage as 32-bit floating point, you'll lose nothing by doing this.
Here's what happens when I knock all the peaks off this file, renormalize it and measure it again:

You'll notice that the average value of the signal has now risen considerably, and the Total RMS value is now -20.9dB - quite a jump! But as far as I can tell, this is pretty much what they want your signals to look like, as it appears to meet most of their criteria. What doesn't meet them in this recording though is the noise floor, which as you rightly surmise is down to the background noise in your room. You can vary this to an extent during the recording process; the closer you are to the mic, the less room noise you record - but you shouldn't overdo this - it's quite disturbing for the listener.
I used the Dynamics Processor as a limiter at -15dB to achieve the second shot above, but you can use anything available, as long as it works, sounds good and measures okay. With your first submission though, make sure you save the 32-bit edited original before processing - if ACX tell you that they don't like it, it's then relatively easy to reprocess that.