Skip to main content
Known Participant
January 9, 2014
Question

Why is Src and Dst memory not pinned?

  • January 9, 2014
  • 1 reply
  • 2009 views

I just completed my first video filter that is CUDA accelerated using the Premiere Pro SDK for CS6 and CC7.  If suitable CUDA hardware is available, it use it, but if not then it uses a multithreaded software implementation.  Both work extremely well.  However, the CUDA implementation could be a lot faster if the source and destination memory buffers were "pinned" memory.  Since they are not, I must copy the source and destination memory to a pinned buffer, and then asynchronously copy that to CUDA device memory and back.  The overhead to copy from the source/destination memory to pinned memory is significant.  Without the copy to pinned memory the CUDA on my laptop is fast enought to process 130 fps for a 1920 by 1080 HD video.  However with the pinned memory I only get about 45 fps.

If I do the exact same filter in DirectShow, the source and destination buffer pools are always pinned and the filter runs much faster than it does on Premiere Pro.  I noticed that the new GPU filter example uses and AE interface, but it does allow access to pinned memory.  However, I have not mastered the AE interface, and I am reluctant to giving up 13 years of learning curve on the Premiere SDK.

Is there any good reason why the source and destination buffer pool in the Premiere Pro SDK is not pinned memory?

Gene

This topic has been closed for replies.

1 reply

SteveHoeg
Adobe Employee
Adobe Employee
January 9, 2014

Hey Gene, pinned memory only applies to host memory. The CUDA memory that you are given through the GPU suite is device resident so pinning does not apply and you can perform GPU computation directly from it without transfer.

gagrindsAuthor
Known Participant
January 9, 2014

Steve,

That is my whole point. To copy from the PC host memory to the CUDA device memory asynchronously, the host memory must be pinned. Hence, the source and destination memory should be pinned. Otherwise, I must copy the source memory to pinned memory I have allocated on the PC, copy it asynchronously to the CUDA device memory, process it on the CUDA device, asynchronously copy it back to the PC pinned memory, and then copy it to the destination memory.

If you copy synchronously, it is slow as Christmas! Therefore, you must copy the memory asynchronously, or you should not use CUDA and GPU acceleration.

My question still stands. Why is the source and destination memory on the PC used by Premiere Pro not pinned memory?

Gene

Gene A. Grindstaff

Executive Manager, SG&I

T: 1.256.730.6983 M: 1.256.566.5376 F: 1.256.730.8046

E: mailto:gene.grindstaff@intergraph.com

Intergraph Corporation

19 Interpro Road

Madison, AL 35758 USA

www.intergraph.com/sgi<http://www.intergraph.com/sgi> |

LinkedIn<http://www.linkedin.com/groups?gid=127267&trk=myg_ugrp_ovr> | Facebook<http://www.facebook.com/intergraph> | Twitter<http://twitter.com/intergraph