Skip to main content
Participant
February 21, 2019
Answered

Is it possible to extract non-english (tamil) text from PDF?

  • February 21, 2019
  • 3 replies
  • 2991 views

Hi,

Is there any library that I can use to extract non-english (tamil language) text from a PDF file?

Any direction is greatly appreciated.

Thanks,

Divakar. V

This topic has been closed for replies.
Correct answer try67

If the text is a part of an image then you'll need to find OCR software that supports this language. No Adobe software does, though.

If the text is not a part of an image then you can simply select all of it, copy it and then paste it into another application.

It should come out correctly, if the font encoding that was used is correct.

3 replies

Participant
January 30, 2023

I found another the easiest way also. Now, use Google Lens and it will automatically translate the Tamil image into extractable Tamil words.

Participant
January 30, 2023


I found this useful software that converts Tamil images and extracts Tamil words. 

[Link removed by moderator]

try67
Community Expert
try67Community ExpertCorrect answer
Community Expert
February 21, 2019

If the text is a part of an image then you'll need to find OCR software that supports this language. No Adobe software does, though.

If the text is not a part of an image then you can simply select all of it, copy it and then paste it into another application.

It should come out correctly, if the font encoding that was used is correct.