PDF validation

New Here ,
Mar 09, 2018 Mar 09, 2018

Copy link to clipboard

Copied

Hi All,

I have few questions to ask.

1) Is there a way to validate the loose and tight lines in Adobe Acrobat Pro.

2) Is there a way to validate the missing base align in Adobe Acrobat Pro.

3) Is there a way to find the units bad break in Adobe Acrobat Pro.

Thanks,

Murali R.

TOPICS
Acrobat SDK and JavaScript, Macintosh, Windows

Views

260

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Mar 10, 2018 Mar 10, 2018

Copy link to clipboard

Copied

JavaScript can fetch the bounding box (quads) and text of each word in a file. It cannot fetch font info. So if you can define your problem in bounding boxes maybe you could do something. But the answer is really 1:probably not 2,3:don’t understand your requirement.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 10, 2018 Mar 10, 2018

Copy link to clipboard

Copied

Thanks for the details.

1. If you could provide any sample coding would be more helpful for me to take it forward. Sorry, I am very new to Jave script.

2. Is there a way to validate the missing base align in Adobe Acrobat Pro.

3) Is there a way to find the units bad break in Adobe Acrobat Pro.

    

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Mar 10, 2018 Mar 10, 2018

Copy link to clipboard

Copied

This is your project, when you say "sample code" you are describing an advanced, complex and time consuming process. You need to become an expert in analysing with fuzzy logic the crude bounding box/text relationship. A fascinating project if you have the time and expertise. JavaScript is frankly too crude, this is more a job for a C++ programmer writing a plug-in based on good PDF internals knowledge.

2) You mean the alignment of bottom text? Well, you will have the bounding boxes. This is NOT baselines. You can look for lines which appear to overlap vertically. But things like superscripts and subscripts may appear. Can you make up an algorithm?

3) This is a textual analysis based on very specific rules. Yes, this might be doable, but you have a huge problem with your sample: PDF files are NOT divided into columns. Text flow is in lines. So we see

"vacuum chamber of about 0.6 and must be reduced ..."

Of course you can try to divide into columns based on rules or fuzzy logic.

If what I am saying sounds too hard, obscure, time consuming or complicated, then the answer is "you cannot do this".

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Mar 10, 2018 Mar 10, 2018

Copy link to clipboard

Copied

LATEST

As Test says, this task is not suitable for JavaScript. There are two main issues. First and foremost is the Acrobat JS model does not provide detailed enough font and text info to do this kind of analysis.  Document text arrangements vary wildly, both in external presentation and internal representation.  All JS provides is the bounding box, so for example, you don't know the character baselines. Second, JS is very slow. Just reading all the words off all pages in a large document could take an hour. The type of analysis you're talking about on a large doc could many hours.

I know, because I've written similar document analysis applications. If you are asking this question, then you have no idea what you are doing.

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines