Skip to main content
Inspiring
March 9, 2023
Answered

Copy / Extract the content of a specific pdf file

  • March 9, 2023
  • 4 replies
  • 1585 views

Hi Community! Hope you are doing awesome. Can anybody help me with this code? I am trying to extract the contents of specific text frames in two areas which will never change, the info I need will be always there.

 

I know some people will tell me about python scrapper, exporting the pdf to xml and looking for the coordinates, i know all that, but I would like to know if is possible to do it directly. Here is the code:

 

var file = File.openDialog("Select a PDF file to extract text from");
if (file == null) {
alert("No file selected");
exit();
}
var pdfDoc = app.open(file);
alert(file);
var page = pdfDoc.pages[0];

var locationpdf1 = [115.2, 64.194, 238.36, 692.194];
var locationpdf2 = [415.6, 776.366, 529.582, 788.366];

var text1 = page.textFrames.add();
text1.visible = false;
text1.geometricBounds = locationpdf1;
var content1 = text1.contents;

var text2 = page.textFrames.add();
text2.visible = false;
text2.geometricBounds = locationpdf2;
var content2 = text2.contents;

var textFrame1 = app.activeDocument.textFrames.add();
textFrame1.contents = content1;
textFrame1.position = [100, 300]; // x, y

var textFrame2 = app.activeDocument.textFrames.add();
textFrame2.contents = content2;
textFrame2.position = [300, 300]; // x, y

alert(content1 + " " + content2)

I will appreciate your help, let me know if you need more info (no, i cant provide an example file, the whole pdf is confidential but anyhow the principle is the same, get data / copy data from specific areas in pdf)

This topic has been closed for replies.
Correct answer m1b

Also here's a tip: when comparing numbers, rather than to compare (a === b), to instead do (Math.abs(a - b) < tolerance). This is because sometimes coordinates are only really meaningful to 3 decimal places in Illustrator, and sometimes even less depending on the process that generated them, but a equality comparison will fail even if the difference is less than 0.001.

- Mark

4 replies

Inspiring
March 21, 2023

Hi community! I got this:

 

var doc = app.activeDocument;
var contenido = "";
var x1 = Math.abs(489.5498046875);
var y1 = Math.abs(789.15380859375);
var x2 = Math.abs(415.599609375);
var y2 = Math.abs(789.16845703125);


for (var i = 0; i < doc.textFrames.length; i++) {
var textFrame = doc.textFrames[i];
var contents = textFrame.contents;
var x = Math.abs(textFrame.position[0]);
var y = Math.abs(textFrame.position[1]);

// Check if the current text frame matches the specified coordinates
if (x === x1 && y === y1 || x === x2 && y === y2) {
contenido += contents; // Concatenate the contents of matching text frames
}
}

 

but it is not concatenating the two numbers I need.... can I ask for some help please?

 

sample file attached.

 

(previous version was like:

 

var doc = app.activeDocument;
var contenido = "";
for (var i = 0; i < doc.textFrames.length; i++) {
var textFrame = doc.textFrames[i];
var contents = textFrame.contents;
var x = Math.abs(textFrame.position[0]);
var y = Math.abs(textFrame.position[1]);
alert("Contents: " + contents + "\nX: " + x + "\nY: " + y);
contenido = contents;
}

)

Inspiring
March 21, 2023

Artboard is set like this:

 

 

m1b
Community Expert
m1bCommunity ExpertCorrect answer
Community Expert
March 9, 2023

Also here's a tip: when comparing numbers, rather than to compare (a === b), to instead do (Math.abs(a - b) < tolerance). This is because sometimes coordinates are only really meaningful to 3 decimal places in Illustrator, and sometimes even less depending on the process that generated them, but a equality comparison will fail even if the difference is less than 0.001.

- Mark

Inspiring
March 10, 2023

ohh wow I did not know about this!!! thank you so much! I am sure today I will have something to show you all.

m1b
Community Expert
Community Expert
March 9, 2023

By the way, perhaps you could open your pdf in Illustrator and remove all the sensitive info (eg. deleting items and replacing text with dummy text) but be sure to leave the textframe that you are interested in (just change the text). That way you should be able to post a sample file. More people will want to help if they have a concrete example.

- Mark

Inspiring
March 9, 2023

Hi good afternoon! I was working on something already but still testing a few things, once I have something good ready I will post it, thank you!

CarlosCanto
Community Expert
Community Expert
March 9, 2023

it doesn't' work like that, you can't add a text frame at a specific location and somehow get the contents of an existing text frame at that location.

 

One way of doing it is getting all text frames, then check their x,y location one at a time, if one matches your expected location then you found it, get it's contents.

Inspiring
March 9, 2023

Somehow I wanted to avoid AI to open the pdf file but I think it is totally necesary, let me try that way (It's going to take a while thou)

CarlosCanto
Community Expert
Community Expert
March 9, 2023

yes, if you have a lot of text items it might take a while.

 

another option, create a temp artboard at the location you expect your text to be, select all in artboard, at this point only one item should be selected if no overlapping occurs. That might be quicker.