Copy link to clipboard
Copied
Below are my own notes that I made in Swedish while investigating a problem where InDesign documents, as well as exported PDFs and IDML versions of the same document, were 15 MB too large.
I have had it automatically translated into English to help as many people as possible:
Sometimes you encounter InDesign documents that, upon export, result in enormous PDF files, without any obvious explanation. A page with just a few lines of text can result in a 15 MB PDF file, even with the [Smallest file size] option selected.
The explanation may lie in the document's metadata. Perhaps you received the document from an advertising agency that used IgniteTech's Xinet, and the document, for some reason, went through their software.
This was the cause in my case. It was components (programs) from XINET that had written a large number of JPEG image files in the form of base64-encoded sections in the XMP data; even with export and countless attempts to resave the file in different formats, the XMP data persisted in the new files.
As I understand it, InDesign lacks the functionality to exclude irrelevant XMP data when exporting to PDF; not even the [Smallest file size] preset removes these third-party added XMP sections.
Here, if you wish, you can proceed by running a Python script that can examine the PDF file (if it's a PDF you have, and you lack the full version of Acrobat). It's relatively easy to run the Python script from Windows PowerShell, which is available on all Windows 10 and later systems. It only needs to be prepared with a few preparation calls, to retrieve the PDF functionality. I have got Python 3.11 (64 bit) installed on my Windows 10.
[For the Python script code, download the attached Python file, rename it to bloadtedpdf.py]
Start "Windows PowerShell" from the start menu (or search for it on your computer). In the blue console window that appears, copy and paste the following text and press Enter to upgrade pip:
python.exe -m pip install --upgrade pip
Then call pip to retrieve the PDF component PyMuPDF (copy and paste the following and press Enter):
pip install PyMuPDF
python.exe .\bloadtedpdf.py "C:\Documents\Export\1page.pdf" --extract
[Example of a call from PowerShell in Windows]
The script can extract metadata files and prints out the largest sections in text form to the console.
To perform a basic check for large metadata in InDesign:
Open "File Info" in InDesign's File menu (Ctrl+Alt+Shift+I). Click on "Raw data" in the list that appears. You will probably see: "Cannot display raw metadata, contents too large".
At the bottom of InDesign's "File information" box, in a drop-down menu, there is an option to export and import what is called a Template, which in practice is all the XMP data. When I exported an XMP template from my document, it resulted in a file that was almost as large as the entire InDesign document.
(A corresponding file was extracted by the Python script as mentioned above.) Note that the XMP data does not contain the document's texts and images, but only information about the document.
As I already mentioned, my file turned out to contain bulky data from a third-party manufacturer.
To remove unwanted metadata from PDF-files using Acrobate (full version, formerly called "Pro")
In Acrobate you can find the XMP data under File / Properties (Ctrl+D). Then click on the "Additional Metadata..." button in the box that appears. In the full version of Acrobat, you can manually remove the unwanted sections in the same interface (the Delete button).
Bloated PDF above. In the previews section of the xinet namespaces are 15 MB of images, that makes no sense in a preview document that is to be sent to a customer or likewise.
The same document, after deleting the third party namespaces from within Acrobat "Pro".
To remove the unwanted metadata from InDesign (which is also possible, but not as easily), you need to export and then import. Start by exporting the XMP as described earlier.
Then open the exported XMP file in, for example, Notepad++. (It is a file in XML format.)
Look for the names of the potentially problematic sections at the beginning of the file:
<rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:xmpTPg="http://ns.adobe.com/xap/1.0/t/pg/" xmlns:xmpGImg="http://ns.adobe.com/xap/1.0/g/img/" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/" xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#" xmlns:stRef="http://ns.adobe.com/xap/1.0/sType/ResourceRef#" xmlns:stMfs="http://ns.adobe.com/xap/1.0/sType/ManifestItem#" xmlns:idPriv="http://ns.adobe.com/xmp/InDesign/private" xmlns:wn_private="http://ns.xinet.com/webnative/private/1.0/" xmlns:xmpG="http://ns.adobe.com/xap/1.0/g/" xmlns:stFnt="http://ns.adobe.com/xap/1.0/sType/Font#" xmlns:wn_text="http://ns.xinet.com/webnative/text-extraction/1.0/" xmlns:fpoinfo="http://ns.xinet.com/fpoinfo/1.0/" xmlns:wn_image_info="http://ns.xinet.com/webnative/imageinfo/1.0/" xmlns:wn_thumbnails="http://ns.xinet.com/webnative/thumbnails/1.0/" xmlns:wn_previews="http://ns.xinet.com/webnative/previews/1.0/" xmlns:ExtensisFontSense="http://www.extensis.com/meta/FontSense/">
In the file above, xinet's "previews" appeared to be the main culprit. When I searched further in the file and found a large section called "wn_previews", I collapsed the node in question in Notepad++, and selected from the beginning of the following line (NOTE!) to the beginning of the found line. When I pressed Delete, approximately 15 MB of Base64-encoded JPEG files, which were in this section, disappeared!
This data stream had been included completely irrelevantly, despite the file having gone through conversions to later InDesign versions and export to packages for sending to another party – InDesign lacks simple ways to identify and remove such enormous sections from its metadata.
I repeat: It's even stranger that these enormous data streams are included in the PDF exports that are supposed to be the most streamlined.
In my case, I removed all sections that had to do with xinet, as well as ExtensisFontSense. Do the equivalent, based on what you discover in your document.
Save the XMP file (preferably under a different name, in case you need to restore the metadata from the previously exported file for some reason). Ensure that the file has shrunk significantly, i.e., that you have removed the bulky parts.
Import the new (edited) XMP file into the document from the same box where the export took place.
Choose to have the import clear and replace all properties with those from the new XMP file.
NOTE: Save the InDesign document under a new name (Save As). For the PDF export, it makes no difference; it works and produces small files already, but the InDesign document itself does not shrink unless it is saved under a new name.
Feature requests:
1) let there be a way to delete all non default metadata namespaces from an InDesign document.
2) let there be a way to exclude metadata namespaces when exporting PDF files from InDesign.
Andreas Jansson
Copy link to clipboard
Copied
Hi @Andreas Jansson , The metadata problem has been around since at least 2017. This thread has a JS script that clears the ID doc and its placed images:
From 2017
https://community.adobe.com/t5/indesign-discussions/file-size-is-too-big/td-p/9370587/page/3
Copy link to clipboard
Copied
I read parts of that thread before posting my text, and I can still not see how it relates to my problem.
To my limited understanding, the xinet and ExtensisFontSense namespaces and their accomapnying sections in the Indesign document metadata that I wrote about, are not related to the links in the document. And the scripts you referr to seem only to relate to the links, at least starting by gathering the links (images).
The Base64 encoded JPEG preview "blobs" from the company XINET was my problem. Some part of XINET's WebNative Suite seems to save these kind of streams inside the XMP metadata of the actual InDesign documents.
Andreas
Copy link to clipboard
Copied
Copy link to clipboard
Copied
The Base64 encoded JPEG preview "blobs" from the company XINET was my problem. Some part of XINET's WebNative Suite seems to save these kind of streams inside the XMP metadata of the actual InDesign documents.
By @Andreas Jansson
Just for some general info:
Xinet WebNative provided an OPI environment (and possibly some other solutions), as you may know anyway. The place where I worked in the early 2000s used it (up until the early teens, I think). I looked up Xinet briefly... It looks like they still provide some DAM services under another company's umbrella, but it doesn't look like WebNative is around anymore (as far as I can see). Regardless, they may still use this brand name in their metadata that apparently stores previews from their DAM platform.