Skip to main content
Participant
February 5, 2017
Question

Match pitch and feel of two speech audio files of the same person

  • February 5, 2017
  • 1 reply
  • 20343 views

I recorded a video project (woman speaking on camera) and we then realized there was a section missing, some audio content we needed to insert.  So we recorded the additional audio content and I'm inserting this audio into the original audio.  It's the same person, speaking, but the pitch of the original recording is just a bit above the newly recorded audio and the "feel" isn't the same (different mics, different room, etc.).  I'm wondering if Audition has a way of helping me to do some digital magic to match these two audio recordings to match them and make them sound like the whole thing is just one long take.  Does anyone know how I might do this? 

Before you say it, of course we could just re-shoot the entire thing but this would include re-renting the cameras, setting up the shot, paying for the person to come back in, etc. etc.  Before we go that far I thought I'd ask the question.  And in case you're wondering, the video shows the person talking, then the shot goes off to show other images while the voice continues over the images.  It's in this "other images" section where we're inserting the new audio file.  After this, the scene goes back to the person speaking and she finishes saying what she's saying.

Any help???  I'm guessing I can use some EQ adjustments to "match" the feel as best I can, lay in some noise to "feather" it, and possibly bend down the high video's audio and bend up the new audio's recording with pitch bending to match the pitches but I'm not sure if this is the best way about going at this.  Is there an easier or more accurate way?

Thank you.

    This topic has been closed for replies.

    1 reply

    SteveG_AudioMasters_
    Community Expert
    Community Expert
    February 5, 2017

    havenc63783602 wrote:

    I'm wondering if Audition has a way of helping me to do some digital magic to match these two audio recordings to match them and make them sound like the whole thing is just one long take. Does anyone know how I might do this?

    I'm afraid that this is a very common request, and that essentially, the answer is no. Even with voice-overs recorded before and after a lunch break using identical equipment, people can hear a difference and even that can't be fixed. The problem here is that human hearing is very sensitive to even the slightest change of inflexion, etc in voices, especially if you run two phrases, recorded at different times and places, together.

    But it's that last bit that gives you some potential wriggle room. There are a few things you can do which will inevitably be cheaper than a reshoot, and relieve the immediate texture jump at the transition. The first is to impose a bit of narrative from a different voice - even if on the face of it it isn't necessary at that point. The second is to impose a musical or sound effect transition, with a slight break in the narrative. If you really think that you can't do either of those, then consider running either as an underscore - this will soften the effect on the ears, but generally I don't think that this is enough.

    I suppose in a way that is 'digital magic'. Well it is in terms of how easy it was to do the same thing when I was a video editor, which involved a lot of tape, a huge pile of machinery and things that are now lost in the mists of time - like pre-rolls on every single edit...

    ryclark
    Participating Frequently
    February 5, 2017

    The other way that you might be able to cover the problem is to just record all of the voice over section again as audio only. That way any quality change happens at the same time as the cut away from the speaker and back again at the end of the cutaway. Thus it probably won't be so jarring as long as the levels match and maybe a bit of EQ is added to make the new recording as near as possible to the original. After all this is what happens all the time in news reports for instance.

    stefan_gru
    Inspiring
    February 9, 2017

    I would agree that the cutaway might distract people from the change in voice quality — or make audiences more forgiving of the perceived change. After trying to use the software to fix things, it ends up being a matter of perception: will audiences even notice? Another possible trick would be to use music to hide the voice quality change, assuming your project has any music bed at all.