Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Copy / Extract the content of a specific pdf file

Contributor ,
Mar 09, 2023 Mar 09, 2023

Hi Community! Hope you are doing awesome. Can anybody help me with this code? I am trying to extract the contents of specific text frames in two areas which will never change, the info I need will be always there.

 

I know some people will tell me about python scrapper, exporting the pdf to xml and looking for the coordinates, i know all that, but I would like to know if is possible to do it directly. Here is the code:

 

var file = File.openDialog("Select a PDF file to extract text from");
if (file == null) {
alert("No file selected");
exit();
}
var pdfDoc = app.open(file);
alert(file);
var page = pdfDoc.pages[0];

var locationpdf1 = [115.2, 64.194, 238.36, 692.194];
var locationpdf2 = [415.6, 776.366, 529.582, 788.366];

var text1 = page.textFrames.add();
text1.visible = false;
text1.geometricBounds = locationpdf1;
var content1 = text1.contents;

var text2 = page.textFrames.add();
text2.visible = false;
text2.geometricBounds = locationpdf2;
var content2 = text2.contents;

var textFrame1 = app.activeDocument.textFrames.add();
textFrame1.contents = content1;
textFrame1.position = [100, 300]; // x, y

var textFrame2 = app.activeDocument.textFrames.add();
textFrame2.contents = content2;
textFrame2.position = [300, 300]; // x, y

alert(content1 + " " + content2)

I will appreciate your help, let me know if you need more info (no, i cant provide an example file, the whole pdf is confidential but anyhow the principle is the same, get data / copy data from specific areas in pdf)

TOPICS
Feature request , Import and export , Scripting , Tools , Type
1.6K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Mar 09, 2023 Mar 09, 2023

Also here's a tip: when comparing numbers, rather than to compare (a === b), to instead do (Math.abs(a - b) < tolerance). This is because sometimes coordinates are only really meaningful to 3 decimal places in Illustrator, and sometimes even less depending on the process that generated them, but a equality comparison will fail even if the difference is less than 0.001.

- Mark

Translate
Adobe
Community Expert ,
Mar 09, 2023 Mar 09, 2023

it doesn't' work like that, you can't add a text frame at a specific location and somehow get the contents of an existing text frame at that location.

 

One way of doing it is getting all text frames, then check their x,y location one at a time, if one matches your expected location then you found it, get it's contents.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 09, 2023 Mar 09, 2023

Somehow I wanted to avoid AI to open the pdf file but I think it is totally necesary, let me try that way (It's going to take a while thou)

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2023 Mar 09, 2023

yes, if you have a lot of text items it might take a while.

 

another option, create a temp artboard at the location you expect your text to be, select all in artboard, at this point only one item should be selected if no overlapping occurs. That might be quicker.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 09, 2023 Mar 09, 2023

working in somethign right now, I'll keep you posted

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2023 Mar 09, 2023

By the way, perhaps you could open your pdf in Illustrator and remove all the sensitive info (eg. deleting items and replacing text with dummy text) but be sure to leave the textframe that you are interested in (just change the text). That way you should be able to post a sample file. More people will want to help if they have a concrete example.

- Mark

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 09, 2023 Mar 09, 2023

Hi good afternoon! I was working on something already but still testing a few things, once I have something good ready I will post it, thank you!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2023 Mar 09, 2023

Also here's a tip: when comparing numbers, rather than to compare (a === b), to instead do (Math.abs(a - b) < tolerance). This is because sometimes coordinates are only really meaningful to 3 decimal places in Illustrator, and sometimes even less depending on the process that generated them, but a equality comparison will fail even if the difference is less than 0.001.

- Mark

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 10, 2023 Mar 10, 2023

ohh wow I did not know about this!!! thank you so much! I am sure today I will have something to show you all.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 21, 2023 Mar 21, 2023

Hi community! I got this:

 

var doc = app.activeDocument;
var contenido = "";
var x1 = Math.abs(489.5498046875);
var y1 = Math.abs(789.15380859375);
var x2 = Math.abs(415.599609375);
var y2 = Math.abs(789.16845703125);


for (var i = 0; i < doc.textFrames.length; i++) {
var textFrame = doc.textFrames[i];
var contents = textFrame.contents;
var x = Math.abs(textFrame.position[0]);
var y = Math.abs(textFrame.position[1]);

// Check if the current text frame matches the specified coordinates
if (x === x1 && y === y1 || x === x2 && y === y2) {
contenido += contents; // Concatenate the contents of matching text frames
}
}

 

but it is not concatenating the two numbers I need.... can I ask for some help please?

 

sample file attached.

 

(previous version was like:

 

var doc = app.activeDocument;
var contenido = "";
for (var i = 0; i < doc.textFrames.length; i++) {
var textFrame = doc.textFrames[i];
var contents = textFrame.contents;
var x = Math.abs(textFrame.position[0]);
var y = Math.abs(textFrame.position[1]);
alert("Contents: " + contents + "\nX: " + x + "\nY: " + y);
contenido = contents;
}

)

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 21, 2023 Mar 21, 2023
LATEST

Artboard is set like this:

 

AntonioPacheco_0-1679421709201.png

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines