Acrobat problems with patent documents
For years I have seen the same OCR problems with PDFs of U.S. patent documents, particularly of a certain vintage (say mid-2000s) or older. Typically these are PDFs downloaded from Google Patents, though others come to me by email from other people so it's unclear where they originated. (The US Patent & Trademark Office does not store patents in PDF; they inexplicably still use TIFF.) The main problem is that "fi" ligatures show up as unrecognized ("?") when you copy text to the clipboard. Other OCR problems include lower-case w routinely showing up as upper-case W.
The other problem I would like to report is Acrobat's inability to support text selection on the two-column layout of patent documents. Various things fool it into selecting text from the other column, including (but not limited to) hyphens.
It would be great if Adobe could finally fix this; it's been a problem for many years.
Thanks,
- bill.
