This is about as good of a result as you're going to get out of adobe audition without any additional third party plugins. Even most third party plugins won't give you a flawless shift to the extent that you wouldn't be able to tell if it was shifted or just someone with an authentically deep voice. The human voice is one of those things we are so uniquely familiar with that the slightest unnatural variation triggers our brands to a phenomenon you've probably heard said in passing conversation, especially during this period in time when AI is become so prevelant. "The Uncanny Valley". Shifting the sound of most any sound effect or instrument or really anything else can be done fairly easy to the point that most people could tell there was maniuplation. But vocal pitching is generally used styalistically in some way, rarely intended to be a convincing re-creation. If you use Waves Vocal Bender or Soundtoys Alterboy, or even melodyne or autotune's manual pitch adjustments, you'll still hear an unnatural warble to it. With all that being said Adobe's pitch adjusting software is quite honestly some of the best I've heard to come as stock plugins in a some mixing application.
My best advice is to make sure and keep a back of copy because this will be destructive editing, and to apply the pitch adjustments inside of the waveform editor and render it to the track.
Pitch shitfting tends to be pretty intensive, and most often, the WORSE sound results you end up hearing is from trying to use it as a live adjustment. It cannot reach it's full potential of sound quality as an actively calulating instance. Find the settings that get you closest to the result you want, make sure it's the highest quality setting, and render it to the track and save so that in is baked in. That baked in version is gonna be the best sound quality you can get with what you have to work with.
Also, don't hesitate to try and record FOR your needs. The quality starts at the performance, not the post production. So if you want a quality deep pitch, start by trying to go as deep as you can in the original recording, and that will set you in the right direction.
Most of that doesn't really apply to using a computer-generated voice - you don't exactly have a lot of choice about what you get. But there is one thing in there that you should be doing anyway, regardless of what you are editing, and it's something we mention frequently. Editing Rule 1: Always edit a copy, not the original.
As for the best way to go about the process - well, you will have to experiment. Time shifting generally brings out the metallic ringing, and this is inherent in the method that has to be used to achieve it, which is basically to slice up all of the audio and duplicate some of the slices. Or remove some of the slices, depending on which way you're going (this is a very simplified explanation of what's actually a pretty complicated process).
You may find, for instance, that you get better results if you do the process more than once, and only shift a small amount at each pass. And treat stretch and pitch separately. It's always worth looking at the advanced controls as well - generally it works better in single voice mode, with less ringing. But no two sources behave the same; experimenting is the way to go. Just note what works best for you, and you can use the settings again for the same voice easily if you save them as a custom preset.