From 0:53 it's going wrong.
You have not got the surface of the roof picked accurately. Place the object on the roof at the end of the shot and make sure that the target accurately matches the roof. I usually add a solid, add the grid effect and then double check both perspective and position. You can then add a text layer, convert the layer to 3D, hold down the Shift key and parent the 3D layer to the accurately placed solid and snap it into position.
That's an awfully long shot to be used in a video. The normal practice is to work on only the frames plus maybe a few frames to act as handles so you can fine-tune the timing in the final edit. 90% of my comps are less than 7 seconds because 7 seconds is an eternity in a film. Nobody can pay attention to one shot for much more than that unless there is something really interesting going on.
The shot also has some lens distortion. To make things really accurate I would try and remove that.
Hope this helps.