Looking to convert lots of documents to text. Need meta data about the document and the text. One data point that is needed is the indent. Need to know if a line is indented from the left margin and some sort of quantification of that indent such as spaces, tabs or something else. If such a tool does not exist I am willing to code it but this new to me. What tools might help solve this problem?
Do you need this data before or after the document is converted? We can only help with analyzing a PDF, i.e. after conversion.
After conversion is acceptable.
If you want to do this by hand, there is a Ruler and Grid in Acrobat. Look on the View->Show/Hide -> Rulers &Grids.
There is also a Measure Tool, look for it in the "Tools" tab.
This can also be done automatically with a script, but it's quite a job.
There's no way to measure the distance of a word from the margin of a page in spaces or tabs, because those don't exist in a PDF files as such. You can measure the physical distance in points and then convert it to inches, centimeters, etc.
What tools/technology are usfull for measuring the physical distance?