Skip to main content
December 7, 2017
Question

get non unicode character in pdf

  • December 7, 2017
  • 2 replies
  • 1268 views

Hi Team,

It is possible to get the all Non-Unicode characters from pdf text?, If possible which method should i use.

Thanks,

Maruthu

This topic has been closed for replies.

2 replies

Thom Parker
Community Expert
Community Expert
December 7, 2017

Export the PDF as Plain Text, I believe this will convert all characters to UTF-8.

Detecting non-unicode characters is a different thing. You can't do that with JavaScript because JS converts everything into Unicode.

Have you looked at the Preflight "Browse Internal Structure" tool? This shows you details of the fonts.

Here is a better tool that allows you to select text and then it shows the properties.

Windjack Solutions, Inc. - PDf CanOpener

You can do this pragmatically with the C++ Plug-in SDK. Is this an option?

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often
Legend
December 7, 2017

Please define “non Unicode character” precisely as it is not a PDF concept.