Copy-pasting replaces digits in some PDFs with Hebrew
- October 30, 2019
- 0 replies
- 282 views
In some PDF files when I try to copy+paste (or extract text by any known means) the digits in the pasted text are substituted with other digits, with no discernible logic, e.g. "1995" becomes "1001" (try the attached file, for example). The number of digits is always preserved though.
The problem is consistently reproduced on different computers and operating systems using both the newest and older versions of Acrobat Reader. The digit replacement "rules" are self-consistent for each file, but differ in each affected file.
NOTE 1: The text itself is pasted perfectly, so this isn't an issue of a missing font or something - we're talking about normal ASCII digits. This actually makes the problem worse because it's easy not to notice the changes and numbers are often the most improtant part of data.
NOTE 2: I've only encountered this problem in files with Hebrew text, though it doesn't mean that this problem doesn't exist with other languages.
Any ideas why this is happening, and how to solve or at least to auto-detect this problem?