Skip to main content
dnguyen71742856
Participant
January 22, 2018
Question

How to exporting foreign languages to html

  • January 22, 2018
  • 4 replies
  • 454 views

Hi, I have a large project to convert pdfs in foreign languages to html.

I am trying to convert a Hindi pdf document to html, but it's not working. It's exporting garbled characters.

Here's an example of one document.

https://www.bart.gov/sites/default/files/docs/Title_VI_Poster_Legal_Size_12-29-09_HINDI.pdf

This topic has been closed for replies.

4 replies

Legend
January 23, 2018

Then you must retype. You have no usable text.

Legend
January 23, 2018

There's an important step before deciding there is a problem with HTML export. Try to select text, copy and paste it (to Word, for example). Often PDF files are badly made and have text which cannot be extracted; such files cannot be exported correctly either.

dnguyen71742856
Participant
January 23, 2018

I've already tried copy and pasting the Hindi characters in the document from the link I included above into Word, as well as,  converting to a Word document.  It doesn't work.

Inspiring
January 23, 2018

You will also need the exact character set for the foreign language installed on your computer.

dnguyen71742856
Participant
January 23, 2018

They don't have the source file anymore so all I have is the pdf file.  How can I get the full foreign character set?  Can you please help send instructions on the best approach?

Thanks!

Legend
January 23, 2018

Are you using the latest Acrobat?

dnguyen71742856
Participant
January 23, 2018

I am using Adobe Acrobat Pro DC 2018.009.20044