Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Extracting empty text or funny characters from Scanned PDF using Apache Tika Tesseract OCR in Ubuntu 16.04

New Here ,
Oct 16, 2018 Oct 16, 2018

Hi,

When I use Apache Tika Tesseract OCR program in Windows I can be able to extract the text from multiple scanned PDFs from a given directory.But when I use same program in Ubuntu 16.04 OS, for couple of documents I am getting funny characters during extraction and some times empty text extraction is coming.Can you please let me know what could be the reason and what should I used to extract text properly.

Thanks and Regards,

Karim

TOPICS
Edit and convert PDFs
296
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 18, 2018 Oct 18, 2018
LATEST

Why do you post the question in the forum for Adobe Acrobat?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines