Copy link to clipboard
Copied
For years I have seen the same OCR problems with PDFs of U.S. patent documents, particularly of a certain vintage (say mid-2000s) or older. Typically these are PDFs downloaded from Google Patents, though others come to me by email from other people so it's unclear where they originated. (The US Patent & Trademark Office does not store patents in PDF; they inexplicably still use TIFF.) The main problem is that "fi" ligatures show up as unrecognized ("?") when you copy text to the clipboard. Other OCR problems include lower-case w routinely showing up as upper-case W.
The other problem I would like to report is Acrobat's inability to support text selection on the two-column layout of patent documents. Various things fool it into selecting text from the other column, including (but not limited to) hyphens.
It would be great if Adobe could finally fix this; it's been a problem for many years.
Thanks,
- bill.
Copy link to clipboard
Copied
What version of Acrobat do you run today (e.g. 11.0.33, 2016.123.92323)? Not "latest" please.
Copy link to clipboard
Copied
The version is 2020.006.20042. But this is irrelevant as I have had this problem for years, through many updates.
Copy link to clipboard
Copied
Can you share a sample file with us? You can attach it to the original message using the tiny paperclip icon at the bottom when you edit it, or upload it to a file-sharing website (like Dropbox, Google Drive, Adobe Cloud, etc.), generate a share link and then post it here.
Copy link to clipboard
Copied
Re: text selection on the two-column layout
Have you tried holding down the "alt" key while "left clicking and dragging" a box on a column? Note the cursor must be the "text selector / I-beam" when you start "drawing" the box around the text. e.g. your cursor must be currently over selectable text when you first click it or it doesnt select anything.
I know that doesn't completely solve the issue (as you can only copy from a single column at a time - e.g. you are not able to select the bottom of one column and then continue to the top of the next column), but it is something.
Copy link to clipboard
Copied
Unfortunately this does not solve the problem. I've had this problem for YEARS with multicolumn PDF's, most common with U.S. patents and patent publications. I'm going to try and attach a screen shot showing how I began and ended the highlighting with the word "The" and ended with "cartridge" all in column 1, but it captures parts of column 2 in a very strange way due to how Adobe identifies the flow of the text.