We get PDF's via email weekly and would like to Microsoft's Power Automate along with Adobe Services to grab the text/data off the PDF's and put it into a SQL Server database.
I know how to create a flow that will watch an email folder and pull the attachment off the email, but from there I'm lost. I've googled a bunch but am having troubles finding anything that explains how to use Power Automate with Adobe Services to pull text/data off the PDF. I see there's an option the Create a Searchable PDF using OCR, but that looks like it wants to save the file not actually pull text off the PDF. I also see there's an option to Extract Tables from PDF but it appears to save the data in .xlsx format. I'm confused if this is possible and if so, how to do it.
Copy link to clipboard
If you are wanting to grab the text and put it into a SQL database, you would be better off using the Extract action instead as that will give you the content of the PDF. OCR PDF will only make the PDF searchable in the PDF.
Gotcha. Looking at the Extract options, it appears I want to use "Extact PDF Structure in a JSON File". I'm a little confused on the output, it seems to be encrypted or binary or something that I need to convert to JSON text. Is that right or am I on the wrong path? Thanks much!
I'm trying to use "Extract PDF Structure in a JSON File". The output is
I recently wrote an article on parsing the PDF Extract content here:
Hope this helps walk through how I did it using the Parse JSON action.
I haven't read this completely but it looks great! I don't know why Google wasn't finding this for me, it would have been very helpful.
I do have my flow finally working and have one question. In one of the PDF's I have data in the Kids array. Does your flow handle this? I had to parse the JSON to get the Path, Text & Kids array, then parse the Kids array. So I have one Apply to each to loop through the JSON, then when the Kids array is not null, another Apply to each to loop through it.
I see your reply on my other post that I could use the Extract PDF Structure in a JSON Object and will have to play with that, it looks like it could save me some steps. Thanks!
For the scenario I was writing about, it didn't have kids. However, if you use the schemas (which you can download here) it would allow you to drop that right into your Parse JSON action and you can access the kids.