I need to convert a few thosand "PDF 2.0" files to an earlier version. How best to do this?

Report · Oct 30, 2021

I have a few thousand files in "PDF 2.0" format, and the hard drive indexing software I use can't index them. Is there a way to batch process these to an earlier PDF version? They are files generated with an MS Word add-in, so they have no fancy features included (text and images only, with searchable text).

Can I do this with one of Adobe's programs? Ideally I want to preserve the filenames, file date/times and the searchable text in the batch processing.

Report · Oct 30, 2021

Acrobat Pro can save as older versions, using File > Save As Other > Optimized PDF. Not sure if it can yet handle PDF 2.0 though.

Report · Oct 30, 2021

What program is generating PDF 2.0? That version is still in the voting stage at ISO, I think. Will Acrobat open the file? If so, SaveAs Optimized and set the version to 1.7 or something.

The very first bytes of the file probably say %PDF-2.0 change that to %PDF-1.7 You might need to use a Linux command that can preserve the binary bytes in the file.

Report · Oct 30, 2021

The program generating the PDF 2.0 files is PDF-Xchange Standard Printer (https://www.tracker-software.com/product/pdf-xchange-standard), which includes an add-in for MS Word. It is capable of generating PDFs in any PDF format, but was set to "Auto" (probably a default setting; I think I never changed the setting). It seems the Auto feature defaulted, at least much of the time, to making PDF 2.0 files.

> "The very first bytes of the file probably say %PDF-2.0..."

Exactly. Are you suggesting that the "PDF-2.0" could be changed with a text editor or hex editor (as a one-off experiment rather than batch process) to "PDF-1.7" and that this would still be a valid (non-corrupted) PDF file? I'm on Windows, not Linux, but maybe there is a way to use GhostScript (which I know little about) in a batch file for this?

Report · Oct 30, 2021

Just so. Change the version to 1.7. If the file is not signed, this should not render anything invalid. GhostScript might work, or a Hex Editor in Visual Studio. Perhaps Notepad++ (though I haven't looked to see if it preserves binary)

Followup: Notepad++ seems to preserve binary, and you can install a Hex editor from Plugins-admin

Report · Oct 30, 2021

Thanks for that. How can I tell if a PDF is signed, in case there are some files I did not generate myself, among the PDF-2.0 ones that I have?

Report · Oct 30, 2021

If you change the version number, any signatures will be invalidated. However, all the PDF will be unchanged. How likely is it that one of the files with version 2/0 is digitally signed? If that's a real possibility, maybe you need to get a new version of your indexing program.

Report · Oct 30, 2021

That's the problem why I am here; there isn't yet a version of the indexing program that can handle PDF-2.0 files. So I'm stuck.

Report · Oct 30, 2021

Sorry for the problem. First, you should contact tracker-software and tell them their Print converter is broken. There is no such thing as PDF 2.0 yet, and even if there was, they shouldn't over-label a version if the features of that version aren't present. Next, tell the folks who make the indexing program that a file labeled 2.0 is just the same as a file labeled 1.7, and to fix their program. Even when 2.0 is released, it will be largely backwards compatible with earlier versions -- it just has a few more features. Finally, you can get a scripting plugin for Notepad++ which may allow you to convert your files in batch mode (replace '%PDF-2.0' with '%PDF-1.7').

Good luck.

Report · Oct 30, 2021

Thanks, margueritek!

Report · Oct 31, 2021

PDF 2.0 is now a thing (ISO 32000-2). https://www.iso.org/standard/75839.html shows it was released in December 2020. The concept of overlabelling applies to PDF 1.x but it isn't clear whether going to PDF 2.0 is a jump with no option to downlabel. Anyway, it's a problem and will be for years - what should come first, producers or consumers? Because without producers there can be no incentive for the consumers to be updated...

Report · Oct 31, 2021

There was ISO 32000-2:2017, and now there is 32000-2:2020. Should we wait for 32000-2:2023? Changes are still being made to the standard. I don't have a copy of the latest version, since ISO charges $200 for a copy. Acrobat will open a file marked as PDF-2.0, as will the Microsoft Edge built-in viewer; but I don't know of any producers that take advantage of any of the new features. There are certainly improvements, especially in the area of PDF/A and related archivable formats, but most of them are backward compatible. Let's pull up a chair and see what happens.

Report · Oct 31, 2021

To beat a dead horse (and then I'll stop). The PDF 32000-2 spec says:

"A PDF processor shall attempt to read any PDF file, even if the file’s version is more recent than that for which the PDF processor was created."

So that puts the onus on the indexing program to come into line.