Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
0

Adobe PDF T1 font printing bug

New Here ,
Mar 30, 2022 Mar 30, 2022

There is an issue printing web pages from Chrome and Edge using the Adobe PDF printer (which does not appear to have been updated since the Obama administration - it still looks exactly as it has for the past ten years, and still only has 1b as a PDF/A option): the resulting PDFs contain only the T1 font, are not searchable, and the text cannot be selected. (It selects in tall blue lines - see screenshot.) Copying and pasting the text yields only gibberish.

2022-03-30_16-52-07.pngexpand image

There is no fix for PDFs created this way; they cannot be re-OCRred (even after removing hidden text), nor can they be edited with the Edit tool.

Increasingly I have seen this in PDFs created by the courts, and the only way we can work with them is to print and re-scan them. But now my own Word Processing department has been creating them, unwittingly using the defective Adobe PDF printer.

 

I have scoured the Internet as well as searched this forum, and, astonishingly, can find absolutely no mention of this issue, which I have been running across with ever-increasing frequency in my position as a Word Processor.

 

Any feedback or further information or insight about this issue would be appreciated.

TOPICS
Create PDFs , Edit and convert PDFs , General troubleshooting , Print and prepress
1.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 30, 2022 Mar 30, 2022

For best results you shouldn't be using web browser with built-in PDF viewing capability; trying to  employ the print action in an attempt to convert an HTML page to a PDF file is also incorrect, since all you're doing is printing an image file.

 

When this is done, that image file is just embedded as an image  layer in the produced PDF document; there is no easy way to make that image layer into searcheable text with OCR tools and expect not to run into problems.

 

If you're already able to execute a print action from the web browser, and select from the context menu the "Adobe PDF printer", then that means you do have Adobe Acrobat installed in that machine.

 

Why not use the built-in feature "Create PDF from Web Page..." ?

 

All you have to do is copy the URL link and paste it the dialogue box, then click the "Create" button. Adobe Acrobat will handle the rest.

 

Additionally, if you have the Adobe Acrobat extension installed and enabled in your Chrome-based web browser(s), this option is also available, but you must have a paid subscription of Adobe Acrobat Pro DC.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 01, 2022 Apr 01, 2022

Hi, thanks for your fast reply.

 

I'm sorry, but I can't figure out what "trying to employ the print action in an attempt to convert an HTML page to a PDF file is also incorrect, since all you're doing is printing an image file" means. I've printed Web pages to PDF using the Print function for decades, and never gotten image files, but proper, textual, searchable and copyable PDFs. Even now, this still works in Firefox and the now-deprecated IE. Only since the latest overhaul to DC has this flaw emerged. And it's not creating an "embedded . . . image layer", it's creating what appears to be a properly-formed PDF made out of text, except it only contains the T1 font (about which I have not been able to find out anything on the Web), and what letters can actually be copied out of it despite the wacky selecting do not match what's visible on the screen. (This is confirmed by inspecting the PDF with the Content tool: it's filled with text, but all nonsense.)

2022-04-01_12-30-33.pngexpand image

 

Now that I've identified it, I can easily work around this issue by simply using the Save as PDF option when printing from Chrome, but it sounds like what you are saying is basically we shouldn't expect the standard-issue Adobe PDF printer that installs along with the program to actually work correctly on a simple Web page, silly us, what were we thinking.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 03, 2022 Apr 03, 2022

You're welcome and I see what you're saying.

 

I've always been under the impression that the action of printing using a virtual printer device driver, such as the Adobe PDF Converter", have always produced an entirely different result on a digitally-produced PDF than what is seen on screen.

 

Since what we interpret as text on a computer screen is obviously not the same as the produced PDF (and much less is it  the same as the final print that is seen on a sheet of paper that comes out of an actual physical printing device), I've also been under the impression all of these years that scan and OCR is a necessary extra step in order to add the layer of "real text" that is missing from the rasterized images (or vectorized images) contained in the PDF structure that results from using a virtual printing device driver as a printer.

 

If you suspect this is a bug you may use this link to submit your findings: 

 

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 04, 2022 Apr 04, 2022

No scan or OCR involved. The Adobe PDF printer catches the (ironically enough, Adobe) PostScript file generated by whatever program is printing and, instead of assembling that into a raster image on paper as a laser printer would, assembles it into a PDF. (First rasterising the PostScript to an image and then running OCR on it would represent a great deal of unnecessary processing.) If the original document is composed of text (such as a Word doc or a webpage), that PostScript file is also filled with ASCII text along with information about font, size, position on the page, graphic objects, pictures, etc., and the Adobe PDF printer uses that information to assemble a facsimile of the original document, composed of text (and any graphic shapes, pictures, etc.), with all the fonts in use either embedded (or referenced), in the Doc Properties Fonts tab. One normally ought to be able to select and copy that text right out of the finished PDF, or even save it as a Word doc, because it's just ASCII rendered in one or another font. In this instance, however, the only font in use is the bizarre T1 (which has no discernable relationship between the ASCII character in the PDF and the glyph used to display it on-screen).

 

I used your link and submitted a bug report. Thanks!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 18, 2024 Oct 18, 2024

I just ran into this issue. My assistant sent me 5 PDF documents with this T1 font and none of the text is editable or convertable to Word as text. Now I can understand how these files were generated, but how is there any way to convert them to an editable PDF?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Feb 17, 2025 Feb 17, 2025
LATEST

Hi @Pallavi_Nuka,

 

Hope you are doing well. Sorry for the trouble, and the delayed response.

 

I tried creating a PDF similarly as the initial post, but was able to get OCR done on the document to fix editing issues.

 

Would you mind letting us know if the issue persists with the latest version (2024.005.20xxx), for us to investigate further?


-Souvik

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines