Skip to main content
Richtpt
Inspiring
March 21, 2022
Answered

OCR PDF with Power Automate and Adobe Services?

  • March 21, 2022
  • 1 reply
  • 3705 views

We get PDF's via email weekly and would like to Microsoft's Power Automate along with Adobe Services to grab the text/data off the PDF's and put it into a SQL Server database.

 

I know how to create a flow that will watch an email folder and pull the attachment off the email, but from there I'm lost.  I've googled a bunch but am having troubles finding anything that explains how to use Power Automate with Adobe Services to pull text/data off the PDF.  I see there's an option the Create a Searchable PDF using OCR, but that looks like it wants to save the file not actually pull text off the PDF.  I also see there's an option to Extract Tables from PDF but it appears to save the data in .xlsx format.  I'm confused if this is possible and if so, how to do it.

 

Any suggestions?

Thanks!

This topic has been closed for replies.
Correct answer Ben V

If you are wanting to grab the text and put it into a SQL database, you would be better off using the Extract action instead as that will give you the content of the PDF. OCR PDF will only make the PDF searchable in the PDF. 

1 reply

Ben V
Adobe Employee
Ben VCorrect answer
Adobe Employee
March 21, 2022

If you are wanting to grab the text and put it into a SQL database, you would be better off using the Extract action instead as that will give you the content of the PDF. OCR PDF will only make the PDF searchable in the PDF. 

Richtpt
RichtptAuthor
Inspiring
March 21, 2022

Gotcha.  Looking at the Extract options, it appears I want to use "Extact PDF Structure in a JSON File".  I'm a little confused on the output, it seems to be encrypted or binary or something that I need to convert to JSON text.  Is that right or am I on the wrong path?  Thanks much!

 

 

Richtpt
RichtptAuthor
Inspiring
March 21, 2022

I'm trying to use "Extract PDF Structure in a JSON File".  The output is 

@outputs('Extract_PDF_Structure_in_a_JSON_File')?['body/jsonFileContent'].  I can't figure out how to parse this.  If I simply output it to a file it creates a file with json text.  I could load that file but I should be able to pull the data I want out of that Extracted File Contents.
 
Any suggestions how to do that?  Thanks!