Math Formulas
Copy link to clipboard
Copied
Is there a way to search for formulas in the documents. A lot of the things I would like to search for are math formulas. Things with subscripts and superscripts.
Copy link to clipboard
Copied
There are very few few formulas or functions stored within a PDF. Almost all are user written in JabaScript and aded by the author of the form. There some folder level functions to sum, multiply, compute an average for some calcuation options. Exponentiation is done using JavaScript's Math.pow() or Math.exp() methods. JavaScript also supports array type variables.
For a detailed answer you might want to provide more specific details. Also Adobe has a number of tutorials and videos about creating forms and http://www.pdfscripting.com has a number of free examples as well as paid for examples and lessons.
Copy link to clipboard
Copied
I think they mean actual formulas, written on the page...
Copy link to clipboard
Copied
If you mean searching for text with sub and super scripts, then yes, it's possible, but tricky. The idea is to acquire the median baseline and height of text on a line, then look for individual words that are smaller and above or below the baseline. If they exist, then you can extrapolate to find the complete formulas. But like I said, this is tricky and inexact stuff. You'd need a full blown AI to really do this correctly.
Use the Acrobat JavaScript Reference early and often
Copy link to clipboard
Copied
Yes so I have a text book. Electrical engineering. Full of equations on the
pages. I was scanning in one chapter which works great. Searching is good
also except the OCR doesn’t recognize those math equations. I mean it’s ok.
I can’t search the real book either. But it would have been nice.
On Thu, Jan 10, 2019 at 2:03 PM Thom Parker <forums_noreply@adobe.com>
Copy link to clipboard
Copied
If the results of the OCR are not good then it's a lost cause.
Copy link to clipboard
Copied
Agreed. Many of the symbols used in mathematical formulas, particular the non-linear arrangements, cannot be OCR'd.
Use the Acrobat JavaScript Reference early and often
Copy link to clipboard
Copied
Hi @Thom Parker can we export PDF text and math formulas to docx with Adobe API with python or other languages? Can we see an example?
Thank you very much.
Copy link to clipboard
Copied
The Adobe API written in C++ can extract all the text, lines and shapes in a PDF, with their position. It would be up to you to analyse the relationshop between text, lines, and shapes, and deduce the formula. This probably sounds hard - it is.

