Copy link to clipboard
Copied
I'm using the new PDF to Excel feature and I have mixed raster based images and text inside PDF files.
The raster based image is a table format showing borders with numbers in the table cells.
This is coming over in the excel file as just an image and not OCR operations are occuring.
I don't see any specific settings I can use for how it is converting the PDF file.
Is there a way to force OCR on all raster based images within the PDF?
Thanks!
Copy link to clipboard
Copied
You would use the OCR service first and then export to Excel.
Copy link to clipboard
Copied
Hello @Joel_Geraci ,
I am using the new Power Automate connector https://helpx.adobe.com/document-cloud/help/pdf-connector-for-microsoft-power-automate.html to perform the OCR and convert to excel but I don't see many options like to force OCR on all images.
Is there another way to automate this with Power Automate?
Copy link to clipboard
Copied
The OCR is a separate service but if you have a mixture of text and image, I don't think it's going to work.
Copy link to clipboard
Copied
Thanks for the suggestion to use OCR first.
Through my testing it seems that PDF to Excel seems to create the same result as PDF to OCR to Excel.
Both processes perform OCR and both have trouble processing OCR on all images withing the PDF.
Copy link to clipboard
Copied
I'm fairly certain that the OCR service only works on image-only PDF. I don't think we have a solution to convert a mixture of text and image to just text.
Copy link to clipboard
Copied
Thanks Joel,
I am running mixed pdf's and some work but it's hit and miss.
I have a meeting with an Adobe Developer Relation Speciailist Friday.
I'll let you know if anything changes.