Skip to main content
Participating Frequently
March 23, 2020
Question

Acrobat problems with patent documents

  • March 23, 2020
  • 3 replies
  • 1007 views

For years I have seen the same OCR problems with PDFs of U.S. patent documents, particularly of a certain vintage (say mid-2000s) or older. Typically these are PDFs downloaded from Google Patents, though others come to me by email from other people so it's unclear where they originated. (The US Patent & Trademark Office does not store patents in PDF; they inexplicably still use TIFF.) The main problem is that "fi" ligatures show up as unrecognized ("?") when you copy text to the clipboard. Other OCR problems include lower-case w routinely showing up as upper-case W.

 

The other problem I would like to report is Acrobat's inability to support text selection on the two-column layout of patent documents. Various things fool it into selecting text from the other column, including (but not limited to) hyphens. 

 

It would be great if Adobe could finally fix this; it's been a problem for many years.

 

Thanks,

 

- bill. 

This topic has been closed for replies.

3 replies

Participant
January 16, 2024

Re: text selection on the two-column layout 

 

Have you tried holding down the "alt" key while "left clicking and dragging" a box on a column? Note the cursor must be the "text selector / I-beam" when you start "drawing" the box around the text. e.g. your cursor must be currently over selectable text when you first click it or it doesnt select anything.

 

I know that doesn't completely solve the issue (as you can only copy from a single column at a time - e.g. you are not able to select the bottom of one column and then continue to the top of the next column), but it is something.

Participant
October 1, 2024

Unfortunately this does not solve the problem.  I've had this problem for YEARS with multicolumn PDF's, most common with U.S. patents and patent publications.  I'm going to try and attach a screen shot showing how I began and ended the highlighting with the word "The" and ended with "cartridge" all in column 1, but it captures parts of column 2 in a very strange way due to how Adobe identifies the flow of the text.

try67
Community Expert
Community Expert
March 31, 2020

Can you share a sample file with us? You can attach it to the original message using the tiny paperclip icon at the bottom when you edit it, or upload it to a file-sharing website (like Dropbox, Google Drive, Adobe Cloud, etc.), generate a share link and then post it here.

Legend
March 23, 2020

What version of Acrobat do you run today (e.g. 11.0.33, 2016.123.92323)? Not "latest" please.

Participating Frequently
March 31, 2020

The version is 2020.006.20042. But this is irrelevant as I have had this problem for years, through many updates.