• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Problem with OCR on a PDF

Community Beginner ,
Aug 31, 2018 Aug 31, 2018

Copy link to clipboard

Copied

Acrobat just stops working and crashes on page 818 of a document that is over 1,000 pages.  I extracted the page and reproduced the issue.  There seems to be a problem with something on the page when it tries to OCR it.

TOPICS
Scan documents and OCR

Views

1.2K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 31, 2018 Aug 31, 2018

Copy link to clipboard

Copied

It's a good idea to extract the page and try it individually, at a minimum that means that there is some sort of corruption on that one page. Otherwise I would have suggested to run it as two 400 page documents and add them together again.

Extra step avoided!

My next step would be where did this document come from? Did you create it? Can that one page be recreated?

What format is the current page? An image saved as a PDF? Can you open that page up in something like Photoshop and re-save it (e.g., from a Photoshop-PDF into (say) a tiff image?)

If you do not have Photoshop, can you post it here and I can give it a try?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 31, 2018 Aug 31, 2018

Copy link to clipboard

Copied

Capture.JPGIt was a scanned document to PDF without OCR and is currently still in that format.  It is a legal document with names, etc. so I can't attach it here but I will attach a picture of a portion of the top of the page I think may be the issue.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 31, 2018 Aug 31, 2018

Copy link to clipboard

Copied

Probably not. Something like that would just be assumed to be an image of some like and treated as such.

Can you get a new copy of that page?

Can you print that page and rescan it (if so, use as high a resolution as you can get out of your scanner without having to invent pixels. If you do not know what I mean by that, please ask). The process this new page?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 31, 2018 Aug 31, 2018

Copy link to clipboard

Copied

I re-scanned the page and tried the OCR again with the same results.  Something on the page is causing the OCR process to crash.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 31, 2018 Aug 31, 2018

Copy link to clipboard

Copied

Wow, what an interesting issue.

OK, now the following will take some time but will help to narrow it down.

Take the page and place a paper over the the top half and see if it fails. If it passes then try the bottom half.

Bunches of years ago on the Mac platform one would/could have Desk Accessory conflict. This same trial and error process would be done to try and locate which two were in conflict with each other.

If you can mask the offending section and manually insert that section in after the OCR-ing, you'll have it. Albeit with a bit of pain.

This all falls into the category of getting the job done and deal with the "why" later.

But it does sound like you're correct that there is SOMETHING on that page that Acrobat doesn't like.

let us know how it works out.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 31, 2018 Aug 31, 2018

Copy link to clipboard

Copied

We covered the top half and the bottom half and then optimized them which processed but did not make them searchable.  Then we OCR'ed the two halves and it made them searchable, so why will it not work with the document as a whole?  Do you work for Adobe and if so, there should be a way that I can share these documents with you confidentially, correct?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 31, 2018 Aug 31, 2018

Copy link to clipboard

Copied

LATEST

Number 1: I do not work for Adobe. I just happen to have a reasonable amount of scanning and OCR experience and like to share. [For example:

https://forums.adobe.com/community/creativepipeline/blog/2018/01/22/scanning-clean-search-able-pdfs]

#2. You can send it to me and I can get that page to an appropriate person. Plus I can test this myself and see what I find. Through the link above, click on my photos and that could/should get messages to me email and we can take it from there. I'm not sure I can prove that I'm a safe person to send this document to other than I'm a retired scientific glassblower and while I have been an expert witness for things related to scientific glassblowing, I stay away from the courts. [If this case has anything to do with scientific glassblowing, I'd be very surprised!]

I do wish I could explain why you experienced what you did but that's sort of like telling your mechanic that there's a noise except when you go to the shop and leave it with him.

Glad we were able to get the problem worked over. Solved, no, but at least you are back in the running.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines