• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers

Scanned PDFs

New Here ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

Hi,

 

I am trying to extract the text from a scanned PDF. But the output is not as expected. Can someone please help to figure out. Attached is the PDF.

Views

180

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

What does you get? What does you expect?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

Getting the attached ouput. No output is coming for page #6 to #23. We require clear output of all the pages in json format.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

There are no text on this pages, only images.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

There must be some work around to get the text out of this kind of pdfs. Would highly appreciate if you could suggest me how do i get the text.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

You can try OCR in Adobe Acrobat.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

https://aws.amazon.com/marketplace/pp/prodview-g2ikxe6zxsi64

 

Adobe PDF Services API is also working in the same way if i am not wrong. The json output which i had shared earlier with you was from Adobe PDF service only.

 

OCR in Adobe Acrobat will turned out to be a manual process, how do i integrate it with my python script. I am really sorry for bothering you but i really need a solution for this.

 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

LATEST

Try splitting the PDF into one containing only scanned pages (eg. 1-5 of that document) and a second PDF that has the non-scanned pages (eg. 6-24).

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources