OCR PDF with Power Automate and Adobe Services?

Explorer ,
Mar 21, 2022 Mar 21, 2022

Copy link to clipboard

Copied

We get PDF's via email weekly and would like to Microsoft's Power Automate along with Adobe Services to grab the text/data off the PDF's and put it into a SQL Server database.

 

I know how to create a flow that will watch an email folder and pull the attachment off the email, but from there I'm lost.  I've googled a bunch but am having troubles finding anything that explains how to use Power Automate with Adobe Services to pull text/data off the PDF.  I see there's an option the Create a Searchable PDF using OCR, but that looks like it wants to save the file not actually pull text off the PDF.  I also see there's an option to Extract Tables from PDF but it appears to save the data in .xlsx format.  I'm confused if this is possible and if so, how to do it.

 

Any suggestions?

Thanks!

TOPICS
PDF Services API , Power Automate

Views

215

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Adobe Employee , Mar 21, 2022 Mar 21, 2022

If you are wanting to grab the text and put it into a SQL database, you would be better off using the Extract action instead as that will give you the content of the PDF. OCR PDF will only make the PDF searchable in the PDF. 

Likes

Translate

Translate
Adobe Employee ,
Mar 21, 2022 Mar 21, 2022

Copy link to clipboard

Copied

If you are wanting to grab the text and put it into a SQL database, you would be better off using the Extract action instead as that will give you the content of the PDF. OCR PDF will only make the PDF searchable in the PDF. 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Mar 21, 2022 Mar 21, 2022

Copy link to clipboard

Copied

Gotcha.  Looking at the Extract options, it appears I want to use "Extact PDF Structure in a JSON File".  I'm a little confused on the output, it seems to be encrypted or binary or something that I need to convert to JSON text.  Is that right or am I on the wrong path?  Thanks much!

 

Richtpt_0-1647881083257.png

 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Mar 21, 2022 Mar 21, 2022

Copy link to clipboard

Copied

I'm trying to use "Extract PDF Structure in a JSON File".  The output is 

@outputs('Extract_PDF_Structure_in_a_JSON_File')?['body/jsonFileContent'].  I can't figure out how to parse this.  If I simply output it to a file it creates a file with json text.  I could load that file but I should be able to pull the data I want out of that Extracted File Contents.
 
Any suggestions how to do that?  Thanks!

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Apr 07, 2022 Apr 07, 2022

Copy link to clipboard

Copied

I recently wrote an article on parsing the PDF Extract content here:

https://medium.com/adobetech/split-pdfs-based-on-content-with-adobe-pdf-extract-service-with-microso...

 

Hope this helps walk through how I did it using the Parse JSON action.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 08, 2022 Apr 08, 2022

Copy link to clipboard

Copied

I haven't read this completely but it looks great!  I don't know why Google wasn't finding this for me, it would have been very helpful.

 

I do have my flow finally working and have one question.  In one of the PDF's I have data in the Kids array.  Does your flow handle this?  I had to parse the JSON to get the Path, Text & Kids array, then parse the Kids array.  So I have one Apply to each to loop through the JSON, then when the Kids array is not null, another Apply to each to loop through it.

 

I see your reply on my other post that I could use the Extract PDF Structure in a JSON Object and will have to play with that, it looks like it could save me some steps.  Thanks!

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 08, 2022 Apr 08, 2022

Copy link to clipboard

Copied

LATEST

For the scenario I was writing about, it didn't have kids. However, if you use the schemas (which you can download here) it would allow you to drop that right into your Parse JSON action and you can access the kids.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources