Skip to main content
Participant
March 10, 2018
Question

PDF validation

  • March 10, 2018
  • 1 reply
  • 522 views

Hi All,

I have few questions to ask.

1) Is there a way to validate the loose and tight lines in Adobe Acrobat Pro.

2) Is there a way to validate the missing base align in Adobe Acrobat Pro.

3) Is there a way to find the units bad break in Adobe Acrobat Pro.

Thanks,

Murali R.

This topic has been closed for replies.

1 reply

Legend
March 10, 2018

JavaScript can fetch the bounding box (quads) and text of each word in a file. It cannot fetch font info. So if you can define your problem in bounding boxes maybe you could do something. But the answer is really 1:probably not 2,3:don’t understand your requirement.

rmurali84Author
Participant
March 10, 2018

Thanks for the details.

1. If you could provide any sample coding would be more helpful for me to take it forward. Sorry, I am very new to Jave script.

2. Is there a way to validate the missing base align in Adobe Acrobat Pro.

3) Is there a way to find the units bad break in Adobe Acrobat Pro.

    

Legend
March 10, 2018

This is your project, when you say "sample code" you are describing an advanced, complex and time consuming process. You need to become an expert in analysing with fuzzy logic the crude bounding box/text relationship. A fascinating project if you have the time and expertise. JavaScript is frankly too crude, this is more a job for a C++ programmer writing a plug-in based on good PDF internals knowledge.

2) You mean the alignment of bottom text? Well, you will have the bounding boxes. This is NOT baselines. You can look for lines which appear to overlap vertically. But things like superscripts and subscripts may appear. Can you make up an algorithm?

3) This is a textual analysis based on very specific rules. Yes, this might be doable, but you have a huge problem with your sample: PDF files are NOT divided into columns. Text flow is in lines. So we see

"vacuum chamber of about 0.6 and must be reduced ..."

Of course you can try to divide into columns based on rules or fuzzy logic.

If what I am saying sounds too hard, obscure, time consuming or complicated, then the answer is "you cannot do this".