Extraction: How to position span elements?
Hi,
after extraction, we sometimes have elements like these, where there is a parent with Text (here "NO") and a child which contains a part of the text, with different font properties (here "2", which should be "N2O" if combined). My question is: How can one combine such an inline element at the right position in the text-flow?
In the docs (Extract API | How Tos | PDF Extract API | Adobe PDF Services) I see:
Text: Text for the element in UTF-8 format, only reported for text elements. When inline elements are reported separately from parent block element, then this value has references to those inline elements.
How can I see such a reference to the inline element? Or is this meant differently?
...
{
"Bounds": [
211.8,
333.0,
222.9,
398.5
],
"Font": {
"alt_family_name": "Times",
"embedded": true,
"encoding": "MacRomanEncoding",
"family_name": "Times",
"font_type": "Type1",
"italic": false,
"monospaced": false,
"name": "...+Times-Roman",
"subset": true,
"weight": 400
},
"HasClip": true,
"ObjectID": 93,
"Page": 0,
"Path": "//Document/Sect/P[15]/Sub[3]",
"Rotation": 90.0,
"Text": "NO ",
"TextSize": 9.2,
"attributes": {
"Placement": "Block"
}
},
{
"Bounds": [
215.7,
339.7,
222.9,
344.5
],
"Font": {
"alt_family_name": "Times",
"embedded": true,
"encoding": "MacRomanEncoding",
"family_name": "Times",
"font_type": "Type1",
"italic": false,
"monospaced": false,
"name": "...+Times-Roman",
"subset": true,
"weight": 400
},
"HasClip": true,
"ObjectID": 94,
"Page": 0,
"Path": "//Document/Sect/P[15]/Sub[3]/StyleSpan",
"Rotation": 90.0,
"Text": "2 ",
"TextSize": 6.4,
"attributes": {
"TextPosition": "Sup"
}
}
...
