Copy / Extract the content of a specific pdf file
Hi Community! Hope you are doing awesome. Can anybody help me with this code? I am trying to extract the contents of specific text frames in two areas which will never change, the info I need will be always there.
I know some people will tell me about python scrapper, exporting the pdf to xml and looking for the coordinates, i know all that, but I would like to know if is possible to do it directly. Here is the code:
var file = File.openDialog("Select a PDF file to extract text from");
if (file == null) {
alert("No file selected");
exit();
}
var pdfDoc = app.open(file);
alert(file);
var page = pdfDoc.pages[0];
var locationpdf1 = [115.2, 64.194, 238.36, 692.194];
var locationpdf2 = [415.6, 776.366, 529.582, 788.366];
var text1 = page.textFrames.add();
text1.visible = false;
text1.geometricBounds = locationpdf1;
var content1 = text1.contents;
var text2 = page.textFrames.add();
text2.visible = false;
text2.geometricBounds = locationpdf2;
var content2 = text2.contents;
var textFrame1 = app.activeDocument.textFrames.add();
textFrame1.contents = content1;
textFrame1.position = [100, 300]; // x, y
var textFrame2 = app.activeDocument.textFrames.add();
textFrame2.contents = content2;
textFrame2.position = [300, 300]; // x, y
alert(content1 + " " + content2)
I will appreciate your help, let me know if you need more info (no, i cant provide an example file, the whole pdf is confidential but anyhow the principle is the same, get data / copy data from specific areas in pdf)