Problems exporting pdf to word with Indigenous language characters

Report · Jan 30, 2021

Hello I am working on our Indigenous language and trying to convert my pdf to word. The linguistic symbols are not coming across, is there a specific language I should be selecting on the export? If I can get the document into word I can quickly edit the document and save it as a csv and import it into SQL Server.

Report · Jan 30, 2021

Unfortunately, the encoding of the fonts in this file is bad. You'll notice you can't even correctly copy and paste text in English from it to another document. This means it can't be exported to another format, like Word.

Report · Jan 30, 2021

Ahh you hit something that prompted me to try a different method of preperation. The English is coming through now. I will attach the new files to the original post.

Report · Jan 30, 2021

Report · Jan 30, 2021

You're creating the file? Is it being scanned and then OCRed? If so, make sure the fonts used support Unicode for best results.

Report · Jan 30, 2021

I have a scanned picture of the page, I then create a pdf from the page. I then OCR the pdf using English. Which language should I be selecting during the OCR process?

Report · Jan 31, 2021

To get effective OCR, you need to choose the actual language of the text. This is because getting accurate OCR is a complex process which uses - among other things - the language structure, punctuation and accent set, spell checking, and other language-specific techniques. There are many languages - including major world languages like Arabic - that have no support in Acrobat, so if you are working with a less well known language, the chances of getting a scan into accurate text are small.

Adobe Community

Problems exporting pdf to word with Indigenous language characters