A "Why Not" question re: PDF file +OCR +Export to MS Word.

Report · Sep 17, 2020

Using current Acrobat 9 Pro (Windows) & current Word for Office 365 (Windows): I repeatedly have been unsuccessful in trying to do this, so if it CAN be done, I hope someone will tell me how to accomplish this. If it CANNOT be done, however, I would greatly appreciate it if someone can explain to me why Adobe does not facilitate doing what I describe below, because -- with all of the necesary information to do it being already present within the PDF file -- it seems silly (if not obstructive) for Adobe to NOT facilitate this. Here's the situation:

I frequently receive PDF files from colleagues, all or part of the text of which must be incorporated into other documents being constructed in MS Word. To try to do this, I open that PDF file in Acrobat Pro 9, then I "Recognize Text", then I "Correct Recognized Text" (to find & fix any OCR errors), then I "Save" the PDF file, then I Export that PDF file from Acrobat 9 Pro to MSWord. The produced MS Word file NEVER includes any of the OCR corrections I made, regardless of whether the "Recognize Text" setting I used was "Searchable Image", "Searchable Image (Exact)", or "Editable Text and Images" (those are the only 3 choices I have). All 3 ways, the OCR corrections never show up in the text of the generated Word document.

I imagine part of the reason is that the OCR corrections end up being stored separately within the PDF file than where the originally OCR'd text gets stored. BUT, but what I don't understand is -- if Adobe allows users to correct OCR text errors -- then WHY does Adobe not also allow those corrections to override the OCR errors in an exported file? Or (at least) why not export both the error AND the correction to make the problematic location easier to find in the exported file? As both the "bad" text AND the "good" text are in the PDF file and accessible to Adobe's programmers, why would they waste the user's time and effort spent in correcting the OCR output if those corrections can't really be "used"?

This all just seems so wrong to me. That's why I'd really like to understand it.

Report · Sep 17, 2020

Hi Dpauldalton,

Interesting, you are correct. I'm using Acrobat Pro DC and I'm seeing the same thing. I will study this more to verify.

Meanwhile, probably the reason i've never seen this before is that I do not like the interface of correcting OCR mistakes in Acrobat and when I plan on exporting out to a Word file, I always opt for doing the correction in Word itself which has significant benefits over the correction dynamics in Acrobat. For example, when doing a spell correction and I'm seeing all of the "him" were turned into "hum," I can do a Change All to fix all of them at one time.

Is that out of your workflow options?

Meanwhile, what I just stated does NOT mean that this bug shouldn't be fixed. It should.

Adobe Community

A "Why Not" question re: PDF file +OCR +Export to MS Word.