The long and short likely is that your 3D layer approach is entirely unnecessary and your layers clip out due to a depth sorting issue or a conflict of layer switches like collpased transformations. Why not simply scale the layers to create the illusion of a zoom? Anything beyond that we can't know, since you haven't offered any info about what's actually going on inside the castle comp.
If your castle and the image of the girl are in the same 3D space there is nothing to do but move the camera. There is no reason at all to animate scale. Working with 3D layers and a camera is exactly like working with actors on a set. The set doesn't move, the actors and the camera do. You don't scale an actor because you can't. You put them where they need to be, block the action, then move the camera.