My usual workflow for this kind of work is to make the composition I am working on about 1 or 2 seconds longer than the sentence or phrase I am animating. I set the work area to where I want the match cut to happen.
Now that the comp is set up I select all of the layers that are going to continue to the next composition, move the CTI (current time indicator) to the out point of the work area and press Ctrl/Cmnd + Shift + D to split all of those layers at the Current time. This includes the audio track.
When the layers have all been split, select them all, Pre-compose moving all attributes to the new composition and then open the new comp. If you trim to the layer length you will get a new comp that is one or two seconds long. Now you just open the composition settings and make the new comp longer.
If you go back to the first (the original) comp and make any changes just set a keyframe at the end of each layer, copy those keyframes, then paste them to the new comp. The whole procedure takes about two minutes and you end up with a new comp that has no extra layers and the action patches perfectly.
Markers on the audio track make this easier. They are incredibly easy to do in Adobe audition.
When each comp is finalized, render it using a visually lossless intraframe (production format) codec and drop them in a specific folder. When they are all rendered you can open a new Premiere Pro project, select all of the properly named clips (part-01.mov, part-02.mov, part-03.mov...), select them all in Premiere, right-click and select "New Sequence From Selected." Your new sequence will be created with all the rendered parts in order. Mute the audio, import your original audio, then start finalizing the soundtrack with the music and sound effects you want to add. Finalize the color grade, and if needed, clean up the edit.
I never make any kind of explainer video that does not have cuts and transitions in them because they strengthen the story. Working with AI (Illustrator) for the graphics, Photoshop for image editing, After Effects for compositing and motion graphics, and Premiere Pro for final editing, and Audition, in most projects, for the final sound mix I can complete a long-form explainer video that is easier to modify to the client requests, tells a better story, is more professional, and takes a lot less time to produce than trying to do the whole project in After Effects. I have had explainer videos with more than 100 layers in a single five or six-second shot. You would quickly run into huge problems in After Effects if you had a couple of dozen shots like that in your half-hour-long project. It would be completely unmanageable.
Good luck with your project. Let us know if you have any other questions.