I made a screen recording of your sample video, ran Track Camera, found a frame where I had 3 trackers that looked like they were on the back wall, deleted all camera trackers on your misplaced wall photos, let the camera solve again, Set my origin and ground plane, created the reference sold and camera, added the grid, verified a good solid attachment to the wall, created a shape layer to simulate photos hanging on the wall. Made the Layer 3D, shift + parented it to the Reference layer, scaled the image, moved it in X and Y on the same plane to fit the wall, moved back to the first frame where the chair covers the new image, added a new solid to be used as a Track matte using trackers on the back of the chair, set a few mask keyframes, then set it as a track matte for the new photo array layer, and fine-tuned the mask with a little feathering. This was the workflow and result after just a few minutes. If I had the original footage the track would be a lot better, but I still probably would have used Mocha AE instead of Camera Tracking. The final composite would render a lot faster.

The Reference Solid with a grid.

Setting up the Track Matte solid:

The result:

The result is not quite perfect but it is very close. A little more time and full-resolution footage would help a lot.
Camera Tracking will not work reliably unless you find a good origin and ground plane, verify that the camera solve is good, and then base all of the rest of the layer placement is based on the Reference Solid or by verifying the tracking on every surface you need to attach a layer to. I never use Nulls because you can't see how they are tracking. Occasionally I will add Text, but most of the time it's all solids as guide layers before I ever start working on the composite.