Copy link to clipboard
Copied
Hi gang;
I'm back at this and not getting anywhere.
I am not using the iteration suite so I need to implement my own basic multithreading. I have read in several messages that std::thread supposedly works fine.
I have a function like so:
void FilterImage (PF_ParamDef *size_param, int cdepth, PF_InData *in_data, PF_EffectWorld *input, PF_EffectWorld *output)
{
Do stuff...
}
And after reviewing many threading examples, I am calling it like so:
std::thread t1(FilterImage,&size_param, cdepth, in_data, input_worldP, output_worldP);
t1.join();
I am using #include <thread> in the .h file.
I compile with full optimizations and /MTd mode. I do not get any compile errors. Yet when I run it in AE, I do not see all the threads being used. They run at 25% which is indicative that it is only using 1 thread. What am I doing wrong here? Any tips or suggestions?
Thanks,
-Rich
Don't think in terms of number of cores, more about a maximum/fixed number of threads and let the distribution across CPUs or cores be handled by the system. The main thing to look at is your algorithm - how do you want to distribute that across threads, what can be handled individually, what resources need to be shared, etc.
In your example above you create three threads (t1, t2, t3) and pass the same data to all of them - this is not how a multithreading algorithm works, as each thread would h
...Copy link to clipboard
Copied
Just to clarify:
I understand if I add more instances, it uses more threads like so:
std::thread t1(FilterImage,&size_param, cdepth, in_data, input_worldP, output_worldP);
std::thread t2(FilterImage,&size_param, cdepth, in_data, input_worldP, output_worldP);
std::thread t3(FilterImage,&size_param, cdepth, in_data, input_worldP, output_worldP);
t1.join();
t2.join();
t3.join();
And yes, this uses more CPU cores.
But I am asking more from a workflow perspective. The above isn't dynamic - in other words, how can you split it up in a loop for instance? How to know the system number of cores when my research suggests you can't find that out? It seems one cannot get this info? Or is there a different way I should be approaching multithreading for AE?
Thanks,
-Rich
Copy link to clipboard
Copied
Don't think in terms of number of cores, more about a maximum/fixed number of threads and let the distribution across CPUs or cores be handled by the system. The main thing to look at is your algorithm - how do you want to distribute that across threads, what can be handled individually, what resources need to be shared, etc.
In your example above you create three threads (t1, t2, t3) and pass the same data to all of them - this is not how a multithreading algorithm works, as each thread would handle the full processing, resulting in a slower function than when not usign multithreading.
So: take you algorithm, rip it apart in chunks that can be handled by seaprate threads.
Example:
Let's say your algorithm works on individual pixels of a common input layer. So the most common approach is to do line rendering, meaning each thread you create gets assigned one or more lines from the input layer and processes them. You could start with something like 16 or 32 threads, divide the height of the input layer in pixels by this number of threads and you have the amount of lines each thread is supposed to handle. Then create the threads and give each thread the shared input pixel data pointer from the layer, and also an individual start and end line pointer. Then in the actual thread function, find the index from start to end line in the pixel data and process from first column to last for each line and write the processed pixel to the output data.
This approach works with iterate_generic as well as std::thread and should be straightforward as a starting point.
The AE SDK has several examples showing this approach in more details in the sources.
Copy link to clipboard
Copied
Thank you, Shachar and Toby.
That confirms some things. I come from a different programming API and your explanation is in-line with how I used to do multithreading there. Here is an example using pseudocode of how I would implement it in the past:
myFunction
{
int threadNr=previous;
int numberProcs = countProcessors();
// Every thread calculates a different line
for (y = y_start+threadNr; y < y_end; y+=numberProcs) {
// Horizontal lines
for (int x = x_start; x < x_end; x++) {
psetp(x,y,RGB(255,128,0));
}
}
}
int numberProcs = countProcessors();
// Launch threads: e.g. for 1 processor launch no other thread, for 2 processors launch 1 thread, for 4 processors launch 3 threads
for (i=0; i<numberProcs-1; i++)
triggerThread(50,FME_CUSTOMEVENT,i); //The last parameter is the thread number
triggerEvent(50,FME_CUSTOMEVENT,numberProcs-1); //The last thread used for progress
// Wait for all threads to finished
waitForThread(0,0xffffffff,-1);
I understand the above logic fully as I've used it many times so I should have probably tried to adapt this to C++. Correct me if I'm wrong, but the equivalent of it would be something along the lines of:
void FilterImage (PF_ParamDef *size_param, int cdepth, PF_InData *in_data, PF_EffectWorld *input, PF_EffectWorld *output, int threadNr)
{
static const int num_threads=4;
// Every thread calculates a different line
for (y = 0+threadNr; y < output->height; y+=num_threads)
{
// Horizontal lines
for (int x = 0; x < output->width; x++)
{
Do Stuff...
}
}
}
static const int num_threads=4;
std::thread t[num_threads];
for (i=0;i<num_threads;i++)
t = std::thread(FilterImage,&size_param, cdepth, in_data, input_worldP, output_worldP, i);
for (i=0;i<num_threads;i++)
t.join;
Although the above can't be compiled as I get an error (I cannot pass the thread i for some reason). Still, for now I want to understand the logic and this seems to be more along the lines of what you've explained, correct because we are taking advantage of the cores on a per-line basis?
Thanks as always, for your help.
Regards.
-Richard
Copy link to clipboard
Copied
Yes, apart from some minor issues (defining the static int twice for example), this is the correct way. The best way to set parameters for a specific thread function (like the start line i) is to put them all in one struct or object and pass that as the one single parameter when creating the thread, then it should also compile.
Copy link to clipboard
Copied
you can use std::thread::hardware_concurrency();
why not use iterate_generic? it's a very simple implementation, and it
allows AE to make wide scope decisions on the amount of threads it runs.
On Mon, May 13, 2019 at 9:41 PM richardr69178942 <forums_noreply@adobe.com>
Find more inspiration, events, and resources on the new Adobe Community
Explore Now