• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

OCR: Page elements get rotated instead of leaving them level

Explorer ,
Jul 08, 2024 Jul 08, 2024

Copy link to clipboard

Copied

I would like to OCR the attached PDF page, preferably into editable text and images. However, whatever I try, it rotates the elements on the page by something like 20°. If I select Searchable Image (Exact), then the recognized text gets rotated. The rotation doesn’t even make sense.

How do I tell Acrobat not to rotate the elements on the page?

TOPICS
PDF , Scan documents and OCR

Views

94

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Jul 08, 2024 Jul 08, 2024

OK, I also had the Old UI going, so I used that.

First, go into Scan & OCR

2024-07-08_14-53-57.png

Next, along the top bar, select "Enhance" and then Scanned Document

2024-07-08_14-54-07.png

Next, click on the Gear icon for Settings

 

2024-07-08_14-54-36.png

And lastly, on the bottom of the window (seen above) click on the Edit of  Text Recognition Options

Then go to Deskew and turn that off. 

2024-07-08_14-54-54.png

As a side, but related issue. I get a business-related journal that I scan for storage. On my flatbed scanner, I scan one side, flip 180°, scan the next page, wash, rin

...

Votes

Translate

Translate
Community Expert ,
Jul 08, 2024 Jul 08, 2024

Copy link to clipboard

Copied

My guess is that it's focusing on these lines (in red) and not on the vertical line.

2024-07-08_14-38-49.png

 Let me know if you're using the new or old User Interface and I'll tell you how to turn the auto-roate off.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jul 08, 2024 Jul 08, 2024

Copy link to clipboard

Copied

Thanks for the quick and detailed reply, Gary!

I am using the new interface, but can switch back to the old interface in case that helps. I faintly remember encountering the same issue years ago.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 08, 2024 Jul 08, 2024

Copy link to clipboard

Copied

OK, I also had the Old UI going, so I used that.

First, go into Scan & OCR

2024-07-08_14-53-57.png

Next, along the top bar, select "Enhance" and then Scanned Document

2024-07-08_14-54-07.png

Next, click on the Gear icon for Settings

 

2024-07-08_14-54-36.png

And lastly, on the bottom of the window (seen above) click on the Edit of  Text Recognition Options

Then go to Deskew and turn that off. 

2024-07-08_14-54-54.png

As a side, but related issue. I get a business-related journal that I scan for storage. On my flatbed scanner, I scan one side, flip 180°, scan the next page, wash, rince, repeat for all ≈ 60+ pages. With Deskew turned on, Acrobat's OCR recognizes that the text is 180° off and rotates the page back to 0° and I'm good to go. So, it definately has some advantages. So, when you're done with this, you might want to turn it back on.

 

Let me know if this solves your issue.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jul 09, 2024 Jul 09, 2024

Copy link to clipboard

Copied

Thank you very much, Gary! Unfortunately, this does not seem to work if I want to get editable text and images. The issue with Searchable Image as output is very big.

 

There are many pages that have black text on white background, plus a grayscale image. With editable text and images, the grayscale image is stored separately from the text, which can be compressed using different algorithms. This brings down PDF size to a fraction of the original size, while maintaining quality. With searachable image as output, the PDF size remains high.

 

I think I should slowly look at other software. This issue has been in Acrobat since ages.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jul 11, 2024 Jul 11, 2024

Copy link to clipboard

Copied

I now switched to doing OCR from a the command line on a Linux machine using a software called OCRmyPDF. That is much faster, compression can be finely tuned, and I don’t get the issue with scewed pages.

 

I still marked your answer as accepted. It is very detailed and probably the best that can be done with Acrobat at the moment. It’s a pity, though, that this bug has not been fixed in years. It looks like Acrobat doesn’t get much development other than user interface overhauls.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 11, 2024 Jul 11, 2024

Copy link to clipboard

Copied

LATEST

@feklee, Thank you for that. What marking answers correct mostly does is to help those with similar issues find answers without having to wait for the answers to come to you (if they are helpful and if they come! :>))

 

Your issue is a bit unique, so while disappointed it didn't work, I was only hopeful that it might. Why Acrobat's OCR engine would follow those lines is beyond me. And you are correct; Acrobat has not done much with their OCR for some time. Adobe pays for the OCR engine (for the life of me, I cannot remember which company they use). The one thing I do wish is that they'd use some of this AI stuff going around to help the OCR process. There is so much that could be done with that, but it's not being done at all — or at least not yet. Maybe OCRmyPDF might do that and embarrass Adobe to do something as well!

 

Good luck!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines