Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

How to OCR document with table and export to text (as proper left/right text).

New Here ,
Jul 02, 2017 Jul 02, 2017

I have a document with many pages that have columnar tables like the following.

    #1  2.1  This is some text. It can go onto the next line

             like this.

    #2  1.3  More text.

    #3  3.2  And some more text that goes on to the next

             line also.

    #4  2.3  And some more text.

When I OCR the document it seems to OCR the columns as separate blocks on some pages and other pages other pages it captures all the text as one big block. In this example let's say it captured it in 4 blocks as shown in the following image.

Screen Region 2017-07-03 at 08.24.08.png

So when I export (or copy/paste) Acrobat exports it in block order. So I get text like the following.

    #1

    #2

    #3

    2.1

    1.3

    3.2

    This is some text. It can go onto the next line

    like this.

    More text.

    And some more text that goes on to the next

    line also.

    #4  2.3  And some more text.

If I export to Word the layout looks ok, but that is because Acrobat has created the Word doc with sections and columns. In this case a three column section till the end of line #3. Then a one column section for line #4. So when I export from Word to text gives the same result

How can I tell Acrobat to OCR or export the text using simple left/right/top/down so I get text like the original document (so like my first example)? Thanks!

System Info: 

    macOS 10.12.5 (16F73) 

    Architecture: x86_64 

    Build: 17.9.20044.222436 

    AGM: 4.30.69 

    CoolType: 5.14.5 

    JP2K: 1.2.2.38123 

  [1]: https://i.stack.imgur.com/JFphy.png

Added more details and picture on how Acrobat is organizing text into blocks. Explained what happens with Word export.

TOPICS
Scan documents and OCR
3.9K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jul 26, 2017 Jul 26, 2017

Hi Reesd,

Sorry for the delay in response.

As i was able to understand the issue from the description mentioned above, when you are exporting a pdf in Acrobat, it doesn't retain formatting, is that correct?

Could you let us know the version of Acrobat you are using to OCR the pdf: Identify the product and its version for Acrobat and Reader DC

Have you tried applying OCR/exporting another similar pdf?

Would it be possible if you can share the pdf file with us? You can share the link for the file via private message. To send a private message, hover your mouse over my username and click Message.

Thank You,

Shivam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Aug 08, 2017 Aug 08, 2017
LATEST

Hi reesd,

We are sorry for the issue you are facing. Along with the information asked by Adorobat, could you please share the files on  which you face this.

Steps to share the file using Adobe send

Share the file using https://cloud.acrobat.com/send

a.            Open this link

b.            Click on “Select files to Send”

c.             Click link "Select file from my computer" and Select the file

d.            Click on Create link

e.            Share this link

Thanks

Rishabh

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines