0
Extracting empty text or funny characters from Scanned PDF using Apache Tika Tesseract OCR in Ubuntu 16.04
New Here
,
/t5/acrobat-discussions/extracting-empty-text-or-funny-characters-from-scanned-pdf-using-apache-tika-tesseract-ocr-in-ubuntu/td-p/10142856
Oct 16, 2018
Oct 16, 2018
Copy link to clipboard
Copied
Hi,
When I use Apache Tika Tesseract OCR program in Windows I can be able to extract the text from multiple scanned PDFs from a given directory.But when I use same program in Ubuntu 16.04 OS, for couple of documents I am getting funny characters during extraction and some times empty text extraction is coming.Can you please let me know what could be the reason and what should I used to extract text properly.
Thanks and Regards,
Karim
TOPICS
Edit and convert PDFs
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting.
Learn more
Community Expert
,
LATEST
/t5/acrobat-discussions/extracting-empty-text-or-funny-characters-from-scanned-pdf-using-apache-tika-tesseract-ocr-in-ubuntu/m-p/10142857#M120981
Oct 18, 2018
Oct 18, 2018
Copy link to clipboard
Copied
Why do you post the question in the forum for Adobe Acrobat?
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting.
Learn more

