"Save as txt" is not working in searchable PDF documents

New Here ,
Feb 14, 2021 Feb 14, 2021

Copy link to clipboard

Copied

Hello,

I wonder if someone could shed some light on a very odd issue with a couple of PDF documents I have.

These PDFs have been OCRed. However, when I choose ‘Save as text’, I get an empty txt file (0 kb). The ‘Find’ feature is working, and that confirms that they are OCRed correctly.

TOPICS
Edit and convert PDFs

Views

88

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Feb 16, 2021 Feb 16, 2021

Copy link to clipboard

Copied

Hi Amro Bilal,

Thank you for reporting the issue. Can you please provide us the details of the following:

  1. Acrobat Version - Are you on reader or DC? Please share the screenshot of the Acrobat version screen
    Go to help -> About Acrobat ... and take a screenshot
  2. Operating system (Mac/Win)
  3. Input PDF Files
  4.  Steps to reproduce the issue

Regards,

Akanksha Garg

Software Engineer

Adobe Acrobat Team

 

 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 17, 2021 Feb 17, 2021

Copy link to clipboard

Copied

Hi Akanksha,

Thank you for getting back to me.

Please find at the end of this message a system report generated from
within Adobe as requested.

The issue:

I used a professional scanning service to scan, OCR, and convert to
Accessible PDF some printed books.

In addition, upon my request, the scanning shop performed a service
where they removed page numbers and headings from the end PDF documents.

When I open any of these end PDF documents in Adobe and choose ‘Save as
Text’, I get an empty txt file.

Also, I use screen reader (assistive technology) to access my computer.
My screen reader, JAWS, is not able to read these end PDF documents when
they are opened in Adobe. However, oddly, if I access these PDFs using a
Chromium based browser, my screen reader is able to read them. This
indicates that the text is there somewhere, but Adobe is not able to
extract it. Hence, when I choose ‘Save as Text’ in Adobe I am getting an
empty txt file.

I contacted the scanning shop to inquire about the method they used to
remove page numbers and headings from the end PDF documents, and this is
what they said-

‘We don't use adobe at all for word conversions. And we don't use the DC
version of adobe for anything, as it's not our favourite version of adobe.

Wasn't the issue only affecting the files that had been unpacked to
jpegs and then repacked after the headings had been removed? The OCR and
the word conversions are done in the same way after repacking. This
suggests to me that there's something in the text reader that doesn't
like PDFs created from image files- as opposed to ones scanned directly
to PDF or fresh PDFs created anew.’

I would be happy to send you one of these PDF documents for investigation.

Please get back to me if you need further clarification.

Finally, here is the system report as requested-

‘Account Detail:

User Rights: Admin

User Account Control: Limited

Process Integrity: Low

Profile Type: None

Acrobat Detail:

Sandboxing: On

Protected View: Off

Captive Reader: No

Multi-Reader on Desktop Support: Off

Available Physical Memory: 4194303 KB

Available Virtual Memory: 3878884 KB

BIOS Version: LENOVO - 1240

Default Browser:

mapi32.dll

Version: 1.0.2536.0 (WinBuild.160101.0800)

Creation Date: 2020/09/09

Creation Time: 10:21:27 AM

Display Detail:

Screen Width: 1920

Screen Height: 1080

Number of Monitors: 1

Number of Mouse Buttons: 5

Has Mouse Wheel: Yes

Has Pen Windows: No

Double Byte Character Set: No

Has Input Method Editor: Yes

Inside Screen Reader: Yes

Graphics Card:

Version: 0.0.0.0

Check: Not Supported

Installed Acrobat:

Installed Acrobat: C:\Program Files (x86)\Adobe\Acrobat Reader
DC\Reader\AcroRd32.exe

Version: 21.1.20138.422477

Creation Date: 2021/02/15

Creation Time: 05:22:02 AM

Locale: English (United Kingdom)

Monitor:

Name: Freedom Scientific Accessibility Display Driver

Resolution: 1920 x 1080 x 0

Bits per pixel: 32

Monitor:

Name: NVIDIA GeForce GTX 1050 Ti with Max-Q Design

Resolution: 1920 x 1080 x 60

Bits per pixel: 32

OS Manufacturer: Microsoft Corporation

OS Name: Microsoft Windows 10

OS Version: 10.0.18363

Page File Space: 4194303 KB

Processor: Intel64 Family 6 Model 158 Stepping 10GenuineIntel~2208Mhz

Session Detail:

Boot Type: Normal

Is Shutting Down: No

Network: Available

Inside Citrix: No

Inside VMWare: No

Remote Session: No

Remote Control: No

Using JAWS: Yes

Using ZoomText: No

Using Windows-Eyes: No

Using NVDA: No

Time Zone: GMT Standard Time

Total Physical Memory: 4194303 KB

Total Virtual Memory: 4194176 KB

Windows Detail:

Tablet PC: No

Starter Edition: No

Media Center Edition: No

Slow Machine: No

Installed plug-ins:

C:\Program Files (x86)\Adobe\Acrobat Reader
DC\Reader\plug_ins\Accessibility.api

Version: 21.1.20138.422477

Creation Date: 2021/02/15

Creation Time: 05:22:02 AM’

Thank you for any help you may be able to provide.

Amro

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Feb 17, 2021 Feb 17, 2021

Copy link to clipboard

Copied

 

Hi Amro Bilal,

 

Thank you for providing the information, but we are unable to reproduce the issue at our end. If possible, please share the file with us, if you can't share the file publically, you may upload the file to your cloud drive and share the link via Private message. To send a private message, click on my profile and choose "Send a message".


Also, can you please tell which Save as Text option you used (Tools> Export PDF> More options> Text (Accessible) or Text(Plain)) ?

 

Thanks and Regards,

Akanksha

Software Engineer II
Adobe Acrobat Team

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 18, 2021 Feb 18, 2021

Copy link to clipboard

Copied

LATEST
Hi Akanksha,

Thank you for your reply.

I will now send you a link to download the file in a private message.

The option I used was (File>Save as Text…). It’s not the Export option
(Tools>Export PDF).

Kind regards,

Amro

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines