Hello I am working on our Indigenous language and trying to convert my pdf to word. The linguistic symbols are not coming across, is there a specific language I should be selecting on the export? If I can get the document into word I can quickly edit the document and save it as a csv and import it into SQL Server.
Unfortunately, the encoding of the fonts in this file is bad. You'll notice you can't even correctly copy and paste text in English from it to another document. This means it can't be exported to another format, like Word.
Ahh you hit something that prompted me to try a different method of preperation. The English is coming through now. I will attach the new files to the original post.
You're creating the file? Is it being scanned and then OCRed? If so, make sure the fonts used support Unicode for best results.
I have a scanned picture of the page, I then create a pdf from the page. I then OCR the pdf using English. Which language should I be selecting during the OCR process?
To get effective OCR, you need to choose the actual language of the text. This is because getting accurate OCR is a complex process which uses - among other things - the language structure, punctuation and accent set, spell checking, and other language-specific techniques. There are many languages - including major world languages like Arabic - that have no support in Acrobat, so if you are working with a less well known language, the chances of getting a scan into accurate text are small.