Skip to main content
Gregory5E99
Participant
March 12, 2021
Question

Power Automate PDF with Embedded Raster Image of Text is not converting using OCR

  • March 12, 2021
  • 1 reply
  • 1055 views

I'm using the new PDF to Excel feature and I have mixed raster based images and text inside PDF files.

The raster based image is a table format showing borders with numbers in the table cells.

This is coming over in the excel file as just an image and not OCR operations are occuring.

I don't see any specific settings I can use for how it is converting the PDF file.

Is there a way to force OCR on all raster based images within the PDF?

 

Thanks!

 

This topic has been closed for replies.

1 reply

Joel Geraci
Community Expert
Community Expert
March 12, 2021

You would use the OCR service first and then export to Excel.

Gregory5E99
Participant
March 15, 2021

Hello @Joel Geraci ,

I am using the new Power Automate connector https://helpx.adobe.com/document-cloud/help/pdf-connector-for-microsoft-power-automate.html to perform the OCR and convert to excel but I don't see many options like to force OCR on all images.

Is there another way to automate this with Power Automate?

Joel Geraci
Community Expert
Community Expert
March 17, 2021

Thanks for the suggestion to use OCR first.

Through my testing it seems that PDF to Excel seems to create the same result as PDF to OCR to Excel.

Both processes perform OCR and both have trouble processing OCR on all images withing the PDF.


I'm fairly certain that the OCR service only works on image-only PDF. I don't think we have a solution to convert a mixture of text and image to just text.