Skip to main content
simsiris
Participant
May 10, 2021
Answered

Convert a PDF image file to .txt doesn't output any text.

  • May 10, 2021
  • 2 replies
  • 1303 views

Hi Community, 

It seems that the export of PDF to txt doesn't launch OCR recognition. When converting the attached document to .txt, no text gets recognized.

However converting to DOCX or HTML outputs the recognition text. Please find the script to reproduce in enclosure. 

 

Is it expected? Any suggestion to trigger OCR when converting to .txt would be highly appreciated.

 

Thanks & regards,

Simon

This topic has been closed for replies.
Correct answer Bernd Alheit

When you export as text Acrobat doesn't perform OCR on the document.

Export as Word or HTML has this option.

2 replies

Bernd Alheit
Community Expert
Bernd AlheitCommunity ExpertCorrect answer
Community Expert
May 11, 2021

When you export as text Acrobat doesn't perform OCR on the document.

Export as Word or HTML has this option.

Bernd Alheit
Community Expert
Community Expert
May 10, 2021

Why does you use "com.adobe.acrobat.accesstext" ?

simsiris
simsirisAuthor
Participant
May 10, 2021

Hmm I put it to experiment. I guess "com.adobe.acrobat.plain-text" is the most apropriated. I'm not actually sure about what does "accesstext" means...

However "com.adobe.acrobat.plain-text" still doesn't trigger OCR and outputs a 6 bytes .txt file, containing the UTF16 BOM + 2 space characters. 

 

Any further idea ?