Extracting text from pdf

New Here ,
Jul 18, 2018

Copy link to clipboard

Copied

Can specific text be extracted from a pdf file?

I have pdf's that have pictures, text, tables and just lines of text in them. The pictures are identified with a g-number, I would like to find a way to extract out all the g-numbers and put them in excel.

Also there is another data set I would like to have extracted as well. But I figure if I can get one, the other should be similar.

Thanks

TOPICS
Acrobat SDK and JavaScript

Views

132

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Extracting text from pdf

New Here ,
Jul 18, 2018

Copy link to clipboard

Copied

Can specific text be extracted from a pdf file?

I have pdf's that have pictures, text, tables and just lines of text in them. The pictures are identified with a g-number, I would like to find a way to extract out all the g-numbers and put them in excel.

Also there is another data set I would like to have extracted as well. But I figure if I can get one, the other should be similar.

Thanks

TOPICS
Acrobat SDK and JavaScript

Views

133

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Jul 18, 2018 0
New Here ,
Jul 23, 2018

Copy link to clipboard

Copied

Is there a better forum to post my question in?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 23, 2018 0
Most Valuable Participant ,
Jul 23, 2018

Copy link to clipboard

Copied

That depends. Are you looking to write a JavaScript program to extract the text (which will come one word at a time)?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 23, 2018 0
New Here ,
Jul 23, 2018

Copy link to clipboard

Copied

Yes, what ever the best process would be to pull out and list all the g-numbers.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 23, 2018 0
Most Valuable Participant ,
Jul 23, 2018

Copy link to clipboard

Copied

Ok, if you want to code in JavaScript you'll need the Acrobat SDK. The methods to research are document.getPageNthWord and getPathNthWordQuads.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 23, 2018 1