Copy link to clipboard
Copied
Hi there,
I've noticed that the PDF Extract API is sometimes returning strings that contain issues. When comparing with the original PDF, this only seems to occur when there is an apostrophe in the text.
It looks like it is replacing the words around the apostrophe with non printable characters.
I can't just filter out these characters because I'll still be missing the words that were replaced.
Is there a way to solve this issue?
Note: My PDF is not an image or scan; it was saved from Microsoft Word. I can copy and paste the proper text directly from the PDF using Preview for Mac.
e.g. from file 1 - "...prior written approval of the Principal\ue202\x9c1\ue014\x8e\x99\x9b\x8e\x9c\x8e\x97\x9d\x8a\x9d\x92\x9f\x8eï (b) Upon request..."
[original text from file 1] - "...prior written approval of the Principal's Representative (b) Upon request..."
e.g. from file 2 - "...in respect of *ROG)LHOGVULJKWV under clause..."
[original text from file 2] - "...in respect of The Company's rights under clause..."
Cheers,
Brad
Have something to add?
Find more inspiration, events, and resources on the new Adobe Community
Explore Now