Copy link to clipboard
Copied
When extracting text from Page 59 (zero based counting) at the following PDF, The word 2008 is extracted as unknown characters:
"Text": "WESTERN UNION Annual Report "
PDF file
When watching the PDF with Acrobat Reader, it looks ok:
I tried with PDFBox, and got the following error for these characters:
No Unicode mapping for twoalt (2) in font HGLLLJ+BulmerMT-ItalicAlt
Any help with that would be highly appriciated !!!
Copy link to clipboard
Copied
I haven't dug too far into the file but generally, if the font uses custom encoding and the ToUnicode map isn't in the font resource, we can't extract the text accurately.