Blend modes, by their very nature, are serious resource hogs. The computer has to play both streams of video, with whatever effects are in use added in "live", then play compute the two streams against each other, according to the mode chosen, to come up with the displayed image.
It's rough with two streams of intraframe, and totally nasty with interframe.
"Intraframe" ... like ProRes, any DNx or Cineform, means that every frame is complete in and of itself. No other frames need be computed to display the current frame. There may be some compression involved, but again, only on the one frame.
"Interframe" ... H.264/5 or "HEVC" ... is very different. There is an "iframe" ... a complete frame, and (often) heavily compressed ... every say 9 to 30 some or more "frames" of the video clip.
In between iframes are p, b, and ... another that I can't remember at the moment ... "frames". But they aren't really frames. They're only data sets ... charts of pixel locactions that a) have changed since the last iframe, b) will change before the next iframe, or c) ... both.
So in order to display most frames, the computer has to find the one or two needed iframes, the needed and relevant data sets, and compute what the "current" frame should look like.
In other words, to display one current frame, it may have to find, correlate, and decompress the data of up to 35 or so other frames.
The cameras have specialized chips designed to do the process, and every flipping camera has it seems a slightly different chip design. Many computers do not have the chip to essentially reverse engineer the media "by hardware" ... so the CPU and perhaps GPU have to handle this via software processes.
Some computers, mostly the Intel rigs with "QuickSync" in their CPUs, and some GPUs, have the bits to handle H.264/5 processing and do mostly ok with it. Many don't.
And ... the long-GOP .264/5 stuff is nasty stuff. I work for/with and teach pro coloristis, who all have machines that make mine (24 core Ryzen, 128GB of RAM, 2080TI, 8 internal SSDs including two Nvme drives) ... look positively amateurish anemic.
Most of them take any job that includes any long-GOP media, and immediately transcode those clips for use in grading. To typically ProRes or DNx. As part of their 'conform' process.
Neil