Power Automate PDF with Embedded Raster Image of Text is not converting using OCR

New Here ,
Mar 11, 2021 Mar 11, 2021

Copy link to clipboard

Copied

I'm using the new PDF to Excel feature and I have mixed raster based images and text inside PDF files.

The raster based image is a table format showing borders with numbers in the table cells.

This is coming over in the excel file as just an image and not OCR operations are occuring.

I don't see any specific settings I can use for how it is converting the PDF file.

Is there a way to force OCR on all raster based images within the PDF?

 

Thanks!

 

TOPICS
Power Automate

Views

105

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Mar 12, 2021 Mar 12, 2021

Copy link to clipboard

Copied

You would use the OCR service first and then export to Excel.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 15, 2021 Mar 15, 2021

Copy link to clipboard

Copied

Hello @Joel_Geraci ,

I am using the new Power Automate connector https://helpx.adobe.com/document-cloud/help/pdf-connector-for-microsoft-power-automate.html to perform the OCR and convert to excel but I don't see many options like to force OCR on all images.

Is there another way to automate this with Power Automate?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Mar 15, 2021 Mar 15, 2021

Copy link to clipboard

Copied

The OCR is a separate service but if you have a mixture of text and image, I don't think it's going to work.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 16, 2021 Mar 16, 2021

Copy link to clipboard

Copied

Thanks for the suggestion to use OCR first.

Through my testing it seems that PDF to Excel seems to create the same result as PDF to OCR to Excel.

Both processes perform OCR and both have trouble processing OCR on all images withing the PDF.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Mar 17, 2021 Mar 17, 2021

Copy link to clipboard

Copied

I'm fairly certain that the OCR service only works on image-only PDF. I don't think we have a solution to convert a mixture of text and image to just text.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 17, 2021 Mar 17, 2021

Copy link to clipboard

Copied

LATEST

Thanks Joel,

I am running mixed pdf's and some work but it's hit and miss.

I have a meeting with an Adobe Developer Relation Speciailist Friday.

I'll let you know if anything changes.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines