Skip to main content
Participant
April 27, 2023
Answered

Undocumented Attributes

  • April 27, 2023
  • 1 reply
  • 1635 views

Got some attributes in a response object that appear to be undocumented. I have a 100 page PDF and the majority of pages have their text extracted. The one in question though is instead extracted as an image. The JSON in question look like: 

{
			"Bounds": [
				79.85000610351563,
				108.1837158203125,
				546.1023712158203,
				714.25
			],
			"Page": 80,
			"Path": "//Document/Figure[3]",
			"attributes": {
				"BBox": [
					80.03669999999693,
					108.38799999999901,
					530.0179999999818,
					714.002999999997
				],
				"Placement": "Block",
				"Suspicion": "{ \"suspicious\" : true, \"reason\" : \"complexTable\" }",
				"SuspicionFBName": "region-complexTable",
				"Suspicious": true
			},
			"filePaths": [
				"figures/fileoutpart18.png"
			]
		}

Can anyone help me understand what the "Suspicion", "SuspicionFBName", and "Suspicious" attributes mean?

 

The data is formatted in a rough table for this section, but the table spans several pages and this is the only one to be extracted as an image. If I open the PDF in Reader I can select the text on that page just fine, it does not present any obvious difference from the pages around it.

This topic has been closed for replies.
Correct answer Raymond Camden

Alrighty, sounds good. So once things are patched up we can expect that the page would extract correctly and not get detected as an image? Or just that these attributes would not be present in the returned JSON?


Ok, I was focused on the "unknown attributes" part. That's logged. I have another thread on the forum here about "page gets extract as image, not text", that's also a known bug. That would be a _separate_ issue. 

1 reply

Joel Geraci
Community Expert
Community Expert
April 27, 2023

Can you share the PDF in question?

Participant
April 27, 2023

Unfortunately I cannot share the exact PDF generating this output.

Raymond Camden
Community Manager
Community Manager
April 28, 2023

Would sharing it privately be an option?

 

FYI, we are looking into this.