get non unicode character in pdf

Hi Team,

It is possible to get the all Non-Unicode characters from pdf text?, If possible which method should i use.

Thanks,

Maruthu

This topic has been closed for replies.

Community Expert

Export the PDF as Plain Text, I believe this will convert all characters to UTF-8.

Detecting non-unicode characters is a different thing. You can't do that with JavaScript because JS converts everything into Unicode.

Have you looked at the Preflight "Browse Internal Structure" tool? This shows you details of the fonts.

Here is a better tool that allows you to select text and then it shows the properties.

You can do this pragmatically with the C++ Plug-in SDK. Is this an option?

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often

Legend

Please define “non Unicode character” precisely as it is not a PDF concept.

Sign up