Acrobat problems with patent documents

Forum|Forum|6 years ago
March 23, 2020
3 replies
1007 views

For years I have seen the same OCR problems with PDFs of U.S. patent documents, particularly of a certain vintage (say mid-2000s) or older. Typically these are PDFs downloaded from Google Patents, though others come to me by email from other people so it's unclear where they originated. (The US Patent & Trademark Office does not store patents in PDF; they inexplicably still use TIFF.) The main problem is that "fi" ligatures show up as unrecognized ("?") when you copy text to the clipboard. Other OCR problems include lower-case w routinely showing up as upper-case W.

The other problem I would like to report is Acrobat's inability to support text selection on the two-column layout of patent documents. Various things fool it into selecting text from the other column, including (but not limited to) hyphens.

It would be great if Adobe could finally fix this; it's been a problem for many years.

Thanks,

- bill.

Scan documents and OCR

This topic has been closed for replies.

A

Aaron34825598ggei

Participant

Re: text selection on the two-column layout

Have you tried holding down the "alt" key while "left clicking and dragging" a box on a column? Note the cursor must be the "text selector / I-beam" when you start "drawing" the box around the text. e.g. your cursor must be currently over selectable text when you first click it or it doesnt select anything.

I know that doesn't completely solve the issue (as you can only copy from a single column at a time - e.g. you are not able to select the bottom of one column and then continue to the top of the next column), but it is something.

J

john_3972

Participant

Unfortunately this does not solve the problem. I've had this problem for YEARS with multicolumn PDF's, most common with U.S. patents and patent publications. I'm going to try and attach a screen shot showing how I began and ended the highlighting with the word "The" and ended with "cartridge" all in column 1, but it captures parts of column 2 in a very strange way due to how Adobe identifies the flow of the text.

try67

Community Expert

Can you share a sample file with us? You can attach it to the original message using the tiny paperclip icon at the bottom when you edit it, or upload it to a file-sharing website (like Dropbox, Google Drive, Adobe Cloud, etc.), generate a share link and then post it here.

T

Test Screen Name

Legend

What version of Acrobat do you run today (e.g. 11.0.33, 2016.123.92323)? Not "latest" please.