Copy link to clipboard
Copied
I have about 80 local PDF files having input forms that have been filled by students. I would like to extract text data from them so that I can easily score their answers. How do you do that by the latest Acrobat Pro? I need do that on local files.
Copy link to clipboard
Copied
You didn't mention your version of Acrobat but it can be done using the Merge Data Files into Spreadsheet command, which is under Tools - Prepare Form (and then under More Form Options, in some versions).
Copy link to clipboard
Copied
Hi there,
We are sorry for the trouble. As described, you want to extract data from the filled PDF form.
Please try the following steps and see if that helps
For more information please look at the help page https://helpx.adobe.com/in/acrobat/using/collecting-pdf-form-data.html#export_user_data_from_a_respo...
Regards
Amal
Copy link to clipboard
Copied
The PDF files were collected via a web form as a file attachment, and so the individual users have not submitted the form. In this case, how do I create and initializethe response file you mentioned? Thank you very much for your help.
Copy link to clipboard
Copied
You didn't mention your version of Acrobat but it can be done using the Merge Data Files into Spreadsheet command, which is under Tools - Prepare Form (and then under More Form Options, in some versions).
Copy link to clipboard
Copied
Thank you very much. It is what I was looking for and it worked, but all the Japanese characters in the form fields are broken after exporting to a CSV file.
Copy link to clipboard
Copied
The encoding of the file created is UTF8, which might not cover Japanese characters. In order to do that you would need to use some other tool, I'm afraid. Maybe try exporting files as TXT or FDF files, and then merge them using a different utility. Another option is to use a script to do it, instead of the built-in Merge Data Files command.
Copy link to clipboard
Copied
Thank you agai. The text encoding looks to be UTF-8 because I could etract fields text by using PyPDF2, which is a Python module to handle PDF forms. For the moment, the use of PyPDF2 is good enough for my purpose, but your suggestion to use the native Acrobat functionality was much easier except for the Japanese character problem.
If I find a fix for my problem, I will post it in this thread for someone else.
Copy link to clipboard
Copied
Can you share a sample file with fields that has Japanese text in them?
Copy link to clipboard
Copied
Here is a sample file.
https://www.dropbox.com/s/faupq7447hb84b9/sample.pdf?dl=0
"Answer1" and "Answer2" should be "日本語 Japanese 日本語" but it is convereted to "... Japanese ...".
Copy link to clipboard
Copied
When exporting it in UTF-8 explicitly it does seem to work correctly. I guess the default encoding is just plain ANSI, then. You can use this code I wrote to export it properly (you can run it from the JS Console, or from an Action, or something like that):
var names = [];
var values = [];
for (var i=0; i<this.numFields; i++) {
var f = this.getField(this.getNthFieldName(i));
if (f==null) continue;
if (f.type=="button" || f.type=="signature") continue;
names.push(f.name);
values.push(f.valueAsString);
}
var doName = this.documentFileName.replace(/\.pdf$/i, "_data.txt");
this.createDataObject(doName, "");
var s = names.join("\t") + "\r\n" + values.join("\t");
this.setDataObjectContents(doName, util.streamFromString(s, "utf-8"));
this.exportDataObject(doName);
this.removeDataObject(doName);
Find more inspiration, events, and resources on the new Adobe Community
Explore Now