Participant
December 28, 2022
Question
While extracting table info from PDF - some text is converted to gibberish text
- December 28, 2022
- 1 reply
- 545 views
Hi There,
While experimenting with table info extraction from PDF files, we get the gibberish text returned in the response. For example, in the attached PDF file, the "Filter bypass installed" text is getting converted to ")LOWHUE\\SDVVLQVWDOOHG".
When we looked at the structured data returned by Adobe API, we found two differences in proper text vs gibberish text font that are respectively:
1. encoding: "Custom" vs encoding: "Identity-H"
2. font_type: "Type1" vs font_type: "CIDFontType0"
Not sure how to solve this issue, any suggestions?
Attached the PDF and structuredData.json file
