Filling out a form with metadata from another PDF

Report · Apr 30, 2024

Hi!

I work in product documentation and for every publication that we produce, we need to send a Print Specification PDF to the printer. The print specification needs to have the publication's title, page numbers, publication number and edition (found in the footer and last page), project, date, and a screenshot of the cover of the publication.

I got the date automatically, and the filename which is the article number+edition. I also managed to get a button to insert an image, which prompts me to select an image file, I select the publication PDF and only the cover photo gets inserted, which works great.

However, I would like to be able to select the publication PDF file and extract all the other fields, hopefulyl at the same time. The Project needs to be inserted manually.

Is there a way to do this with Acrobat Pro?

Report · May 02, 2024

Hi,
That is possible runing a wizard action from the publication file which will fill the specification file.

@+

Report · May 21, 2024

Yes, you can automate the extraction of metadata such as publication title, page numbers, publication number, and edition from a PDF using Acrobat Pro, along with a script to streamline the process. Here’s a step-by-step guide to achieve this:

Step 1: Prepare Your Environment

Open Acrobat Pro: Launch Adobe Acrobat Pro on your computer.
Access JavaScript Console: Go to Tools > JavaScript > JavaScript Console.

Step 2: JavaScript for Metadata Extraction

You can use JavaScript in Acrobat Pro to extract metadata. Below is a sample script that extracts the title, number of pages, and other specific text if they follow a pattern or are in certain locations within the PDF.

// Open the console in Acrobat Pro and paste the following script:

// Function to get document metadata
function getDocumentMetadata() {
    var docTitle = this.documentFileName; // Use filename as title if no metadata title
    var numPages = this.numPages; // Total number of pages
    var publicationNumber = "";
    var edition = "";

    // Try to get the title from the metadata if available
    if (this.info.Title) {
        docTitle = this.info.Title;
    }

    // Loop through the pages to find specific text for publication number and edition
    for (var i = 0; i < numPages; i++) {
        var pageText = this.getPageNthWord(i, 0, true);
        if (pageText) {
            var match = pageText.match(/Publication Number:\s*(\S+)/);
            if (match) {
                publicationNumber = match[1];
            }

            match = pageText.match(/Edition:\s*(\S+)/);
            if (match) {
                edition = match[1];
            }

            // Break if both fields are found
            if (publicationNumber && edition) {
                break;
            }
        }
    }

    console.println("Title: " + docTitle);
    console.println("Total Pages: " + numPages);
    console.println("Publication Number: " + publicationNumber);
    console.println("Edition: " + edition);
}

// Run the function
getDocumentMetadata();

Step 3: Run the Script

Open the JavaScript Console: Press Ctrl + J to open the JavaScript Console in Acrobat Pro.
Paste the Script: Copy and paste the script above into the console.
Execute the Script: Click on the "Run" button (or press Ctrl + Enter) to execute the script.

Step 4: Review the Output

The console will display the extracted information:

Title: The title of the document.
Total Pages: The total number of pages in the document.
Publication Number: Extracted from the document if a specific pattern is found.
Edition: Extracted from the document if a specific pattern is found.

Step 5: Automate Image Insertion

Since you already have a button to insert an image, you can combine this functionality with the metadata extraction script. Unfortunately, Adobe Acrobat Pro does not allow full automation of all tasks with JavaScript alone due to security restrictions, but you can streamline the process as much as possible.

Additional Tips:

Refine Text Extraction: Adjust the text extraction part to fit the exact pattern or position of your publication number and edition.
Batch Processing: For batch processing, you might need to look into more advanced scripts or third-party tools like Python with PyPDF2 or similar libraries.

Example for Further Customization:

If your publication number and edition are always in the footer or specific pages, refine the script to target those areas. Here’s an enhanced part of the script for targeting specific pages:

for (var i = numPages - 1; i >= 0; i--) { // Assuming footer info is on the last pages
    var pageText = this.getPageNthWord(i, 0, true);
    if (pageText) {
        var footerMatch = pageText.match(/Publication Number:\s*(\S+)\s+Edition:\s*(\S+)/);
        if (footerMatch) {
            publicationNumber = footerMatch[1];
            edition = footerMatch[2];
            break;
        }
    }
}

By following these steps and customizing the script as needed, you can automate much of the metadata extraction process for your print specification PDF.

Elisabet

Report · May 22, 2024

This seems amazing! Thank you very much. I will try to use it and come back with the info 🙂 I am just learning how to automate things within Acrobat and the use of scripts, and I get excited to test them.

Report · May 21, 2024

Here is my proposal I did (I was waiting for an answer of the requester).
you have to create an action wizard with this script:

var otherDoc=app.openDoc("Specifications.pdf",this);
otherDoc.getField("title").value=this.info.Title;
otherDoc.getField("fileName").value=this.documentFileName;
otherDoc.getField("date").value=util.printd("dd-mm-yyyy",new Date());
otherDoc.getField("project").value="No found on the document!";
otherDoc.getField("frontPage").buttonImportIcon(this.path);
var pt2mm=25.4/72;
var aRect=this.getPageBox();
otherDoc.getField("dimensions").value=(Number(aRect[2])*pt2mm).toFixed(1)+" x "+(Number(aRect[1])*pt2mm).toFixed(1)+" mm";
otherDoc.getField("nbPages").value=this.numPages;
var p=this.numPages-1;
for (var i=0; i<this.getPageNumWords(p); i++) {
		console.println("OK : "+this.getPageNthWord(p, i, true));
	try {
		if (this.getPageNthWord(p, i, true)=="No" && this.getPageNthWord(p, i+1, true)=="Publication") {
			otherDoc.getField("article").value=this.getPageNthWord(p, i+2, true);
		} else if (this.getPageNthWord(p, i, true)=="Edition") {
			otherDoc.getField("edition").value=this.getPageNthWord(p, i+1, true)+" "+this.getPageNthWord(p, i+2, true);
			break;
		}
	} catch(e) {}
}
otherDoc.saveAs({
	cPath: otherDoc.path.replace(/.pdf$/i," ("+this.info.Title+" - "+util.printd("dd mmmm yyyy",new Date())+").pdf"),
});
this.closeDoc();

Specification and publications files are in the same folder for this example.

Open a publication file.

Capture d’écran 2024-05-21 à 22.52.57.png

Click on the action wizard tools then click on the one you created.

Capture d’écran 2024-05-21 à 22.49.32.png

Add all other publication files you need to generate a specification file.

Capture d’écran 2024-05-21 à 22.49.11.png

Then click on "Start". The action will generate all specification files.

Capture d’écran 2024-05-21 à 22.57.24.png

The script must be adapted in accordance with real needs and real layout of the publication files...

@+

Report · May 22, 2024

It would be generous to call me a beginner when it comes to scripting, so I must admit I felt a bit lost with your first answer. I really really appreciate the lengthy, clear, and specific help here. Thank you very much! I will attempt it and get back on the results.