Plug-In crawling on After Effects 2024 (SDK)
Copy link to clipboard
Copied
Hi gang;
Let me preface by painting the big picture. I have a development machine which I do all my programming on. This is running a really old Intel chip (i7-4930K). I develop on AE 2020 and AE 2023. Unfortunately, the new AE requires AVX2 and my machine doesn't support it, so I cannot download AE 2024 and test it. I like developing on older tech because it guarrantees the slowest my plug-in would ever run.
I decided to test my plugin on my work machine which is a best. It's a Ryzen Threadripper 3979X running AE 2024... and it crawls. It's really, really slow. I then went to test it on yet another slow machine I have (i7-3930K) and on a really old AE CS6 and it FLIES! Super fast.
So my most powerful machine, with the newest AE version, is the slowest!
Clearly, there is something off. I tried messing with various AE 2024 settings but nothing. It crawls.
I do not have MFR implemented on my plugin but on my dev machine with AE 2023, it is still quite fast despite not supporting MFR. So I don't think that's the issue.
It seems to be specific to my Threadripper, or AE 2024. But I can't test AE 2024 on any other machine due to the AVX2 issue (thanks Adobe).
The only other thing I can imagine is the release mode vs debug mode. I am pretty sure I am compiling it in release mode and I believe if I wasn't, I wouldn't be able to run it on any other machine since it would report missing dependables, correct?
So does anyone have any suggestions or ideas what could be happening again?
To summarize, two really old machines running AE versions prior to 2024 are running my plugin super fast and well. My work machine, which is the fastest of all, with the latest AE, crawls when running it. All other plugins seem to run fine on it.
Richard
Copy link to clipboard
Copied
i honestly don't know why that would happen. in your shoes, i would have put some profiling code in my plug-in that would write data into a file, benchmark on the machines where it runs fast, and then see wherre the slowdown occurs on the problem occurs on the new machine. with a few iterations of this process you could pinpoint the problem area.
Copy link to clipboard
Copied
On windows CS6 feels much more responsive than the later Ae builds. One advantage of the new builds is you can see the frame render time, though it's just a benchmark and will report insanely quick renders if the frame or part of the frame is cached.
To get a more accurate benchmark, render the plugin with noise applied beforehand and clear the cache before rendering. See how fast CS6 renders 5 minutes of frames vs Ae 2024. MFR is good at taking advantage of modern hardware so the later versions should be as fast or faster (when you build it as MFR native).
Copy link to clipboard
Copied
- Hi guys;
Thanks for your replies. I tore the plug-in away bit by bit last night and it comes down to transform_world. That is what kills the performance. I turned on AE render timings and it's about 700 ms per frame. If I comment out the transform_world it about 20 ms per frame - as it should be.
I should mention I was also able to put AE 2024 on the slow laptop and it works perfectly fine. So for some reason, this issue is specific to my Threadripper machine. I assume the 32 Cores is somehow involved here.
I intend to eventually add MFR but the current plug-in without MFR works really well on the laptop with AE 2024 so it's not the lack of MFR that is hampering it. I also disabled MFR in AE and the performance issue is still there.
This is a particle system to it is drawing a mere 500 particles per frame (creating 500 transform_worlds) and barely functioning on the Threadripper whereas all other much, much slower machines work just fine!
I saw that my Threadripper has CPU Virtualization disabled and I thought that might have to do with AE's new AVX2 requirements but then I checked the laptop and it too has virtualization disabled. The only difference is aside from being really old, the laptop is only 4 cores.
I'm at a loss especially because it seems to be specific only to the Threadripper - to that hardware.
Are there any other hardware considerations that could be the culprit here?
Richard
Copy link to clipboard
Copied
So I've reduced my app to barely nothing - just the loop and transform_world display and the rendertimes are still around 600-700ms per frame on the Threadripper, while only around 150ms on the slow machines.
I did at test and I changed the CPU affinity to only one, out of the 64 cores. So that it would only use core 0. The rendertimes then drop to 150-200 ms. So I am suspecting more and more that the number of cores is a culprit here.
However, shouldn't transform_world be thread safe?
-Richard
Copy link to clipboard
Copied
Could the particles be drawn in the correct place without using transform world? For example in a 2d library, run the matrix on the coordinate of the particle and then draw the particle in the transformed coord. Doing a transform world per particle sounds like a recipe for high render times as the number of particles increases.
I think the problem with multithreading here is that multithreaded code in a for loop is not performant because it has to do them one after the other in a sequence. Transform world being multithreaded will likely make things worse because Ae will have overhead trying to make each of these small transform worlds multithreaded. Rinse and repeat 500 times and it's a slow process. I had this case when I changed my PF_Iterate code to Iterate_Generic (which is threaded and much faster if the processing is heavy) but because I ran it in a big loop, the PF_Iterate was quicker due to lower overhead.
Copy link to clipboard
Copied
Hi James!
thanks for your thoughtful reply - it's interesting to hear your example and the speed benefit it gave.
Yes, I've considered foregoing transform_world altogether but then the big problem is motion blur. The main reason for using it is to take care of that aspect. The thought of trying to manually code motion blur, for translation, rotation and scale, is... daunting, to say the least. And motion blur is a critical part of this.
But I will give it more thought. Ungh.
Regards,
Rich
Copy link to clipboard
Copied
if i'm not mistaking, AE's own "Transform" effect (effects->distort->transform) also interanlly uses transform_world. is it also sluggish on your new machine compared to the old one?
i once ran into the strangest bug that manifested as a memory leak. after a couple of tedious days of hunting it down i started commenting out parts of my plugin until finally the problem was found and it was... surprising. turns out having PF_OutFlag_KEEP_RESOURCE_OPEN while having an arb param causes a leak outside the plugin, somewhere in the bowels of AE.
what i'm saying here is that the problem may be deus ex... so if ae's "Transform" effect performs well, maybe try using transform_world in some sample project and see if the result is the same.
Copy link to clipboard
Copied
Hi Shachar;
Thanks for your suggestion.
I tried the Transform effect and it works fast and fine. However, it is only one transform_world and my plugin also performs very well with only a few particles. So it seems to be more about the number of transform_world calls rather than the size of it.
I went ahead and transplanted a very short snippet of code that essentially randomly drawns 500 20x20 transform_world rectangles across the screen, into the Shifter project.
As before, the timings are around 150ms on the slow machines and close to 700 ms on the super-fast 64 core Threadripper. So I guess it doesn't have much to do with the rest of my code - as good of a suggestion as that was.
I also did the same with the SDK_Noise project and got the same results.
Since all evidence points to the increased number of cores as the culprit, perhaps my next step will be to implement multithreading and see if / how that makes a difference. As per James' note, it might make it worse, or it might make it better.
But it is frustrating indeed that such a powerful machine produces the slowest results most likely due to the number of cores and I suppose the manner in which AE works. I still can't get over how lightning fast it was on a slow computer, running CS6!
After I do that, if there's no improvement, I'm not sure what else I can do. Foregoing transform_world will open a huge can of worms in trying to manually do motion blur. 😐
-Richard
Copy link to clipboard
Copied
ok... now that starts to remind me of an issue i've encountered way back when.
i was using transform_world to rasterize many instances of a small texture along a path to create a brush storke. it ran fine on a macbook with windows installed, but on the same machine running osx it would get super sluggish. turns out on mac there was (and maybe still is) an overhead for calling transform_world. the function itself was working just as fast regardless of the transformed buffer size, but the acuumulation of the 100 overhead call lags was the performance killer...
i ended up writing my own implementation of 2d transform which solved the problem.

