How to OCR Tibetan in Adobe Acrobat Professional
I searched online and found an answer in this link The Tibetan and Himalayan Library..
This is the process for running OCR on a PDF so that it is searchable, using Acrobat Professional:
- For most PDFs, you want to run Optimize after you scan them. First rename the file; then pull down the Document menu and select Optimize.
- Then, to run OCR: open the PDF file you want to run OCR on.
- Pull down the File menu, choose "Save as," and add "-ocr.pdf" to the file name
- Pull down the Document menu, point to "OCR Text Recognition," and then point to "Recognize Text Using OCR…" and "start"
- The OCR process will start. It will take some time, depending on the number of pages in the PDF.
- When it finishes, save the file. Be sure to check by doing a search on "the" or another word in the file and make sure it returns results.
To OCR roman text with diacritic characters, investigate using Abbyy's FineReader (http://www.abbyy.com/). No THL staff have used this and we have no experience with it. For more information, see Zach Rowinski's assesssment.
Read more: http://www.thlib.org/tools/wiki/How%20to%20OCR%20a%20PDF.html#ixzz4bcSy4Ql1
However, I couldn't even find the Document manu in my Acrobat for mac. I wonder what version of the Acrobat that could have Document manu and that can recognize Tibetan as it described in the above link?
Qiang Lu
