Copy link to clipboard
Copied
I am facing the problem. I have my PDF file, and I need it to convert to HTML. I am from Serbia, and in my PDF there are special characters like ć.č,đ,š,ž. When i convert to HTML or Word file, Instead, I get characters like {,^,~,},`, Could someone know the fix for this? Thanks
Copy link to clipboard
Copied
Hi there
I hope you are doing well, and I'm sorry to keep you waiting.
It seems like the issue you're experiencing is related to encoding. When converting a PDF with special characters (such as ć, č, đ, š, ž), the software might not properly handle the character encoding, resulting in incorrect symbols.
Here are some steps you can try :
- Open the PDF in Acrobat.
- Go to File > Export To > HTML Web Page or Microsoft Word. In the export settings, make sure the language is set to Serbian or a compatible language that includes special characters.
- Export the file and check the output.
You may also check Font Embedding in the PDF:
- Open the PDF in Acrobat.
- Go to File > Properties > Fonts and check if the fonts used in the document are embedded. If they’re not, the characters might not map correctly during conversion. Try embedding the fonts and exporting again.
Also, check your system locale and language:
- Ensure your computer’s system locale is set to support Serbian characters:
Windows: Go to Control Panel > Region > Administrative > Change system locale, and select Serbian.
Mac: Check System Preferences > Language & Region.
If the PDF contains scanned images or text that isn't selectable, use an OCR tool that supports Serbian characters. Adobe Acrobat's OCR feature supports multiple languages and can help recognize special characters during conversion.
Let us know how it goes.