• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Undocumented Attributes

Community Beginner ,
Apr 27, 2023 Apr 27, 2023

Copy link to clipboard

Copied

Got some attributes in a response object that appear to be undocumented. I have a 100 page PDF and the majority of pages have their text extracted. The one in question though is instead extracted as an image. The JSON in question look like: 

{
			"Bounds": [
				79.85000610351563,
				108.1837158203125,
				546.1023712158203,
				714.25
			],
			"Page": 80,
			"Path": "//Document/Figure[3]",
			"attributes": {
				"BBox": [
					80.03669999999693,
					108.38799999999901,
					530.0179999999818,
					714.002999999997
				],
				"Placement": "Block",
				"Suspicion": "{ \"suspicious\" : true, \"reason\" : \"complexTable\" }",
				"SuspicionFBName": "region-complexTable",
				"Suspicious": true
			},
			"filePaths": [
				"figures/fileoutpart18.png"
			]
		}

Can anyone help me understand what the "Suspicion", "SuspicionFBName", and "Suspicious" attributes mean?

 

The data is formatted in a rough table for this section, but the table spans several pages and this is the only one to be extracted as an image. If I open the PDF in Reader I can select the text on that page just fine, it does not present any obvious difference from the pages around it.

TOPICS
PDF Extract API , REST APIs

Views

1.0K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Adobe Employee , May 05, 2023 May 05, 2023

Ok, I was focused on the "unknown attributes" part. That's logged. I have another thread on the forum here about "page gets extract as image, not text", that's also a known bug. That would be a _separate_ issue. 

Votes

Translate

Translate
Community Expert ,
Apr 27, 2023 Apr 27, 2023

Copy link to clipboard

Copied

Can you share the PDF in question?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 27, 2023 Apr 27, 2023

Copy link to clipboard

Copied

Unfortunately I cannot share the exact PDF generating this output.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Apr 28, 2023 Apr 28, 2023

Copy link to clipboard

Copied

Would sharing it privately be an option?

 

FYI, we are looking into this.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 04, 2023 May 04, 2023

Copy link to clipboard

Copied

Hey, sorry about the delayed response.

I unfortunately do not have the option to share the source PDF, know that makes the diagnosis difficult and makes it fall mostly on my head. I'm also not actually certain on how the PDF itself was generated.

 

Was hoping that knowing what the above properties mean would help me diagnose what is up w/ the file. At the very least I know I can add a check on the returned JSON to watch for those properties and flag a potential mis-extraction.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
May 04, 2023 May 04, 2023

Copy link to clipboard

Copied

As an FYI, we are still digging into this ourselves. It is our goal to ensure each and every change is documented properly. Not only is this not documented, it's not in the JSON schema. So this is a high priority thing for us. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
May 05, 2023 May 05, 2023

Copy link to clipboard

Copied

Ok, as I suspected, this is a bug, and the fields you saw should be removed when the bug is fixed. Basically, nothing to see here, move along, etc etc. 😉 Thank you for bringing this up though!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 05, 2023 May 05, 2023

Copy link to clipboard

Copied

Alrighty, sounds good. So once things are patched up we can expect that the page would extract correctly and not get detected as an image? Or just that these attributes would not be present in the returned JSON?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
May 05, 2023 May 05, 2023

Copy link to clipboard

Copied

LATEST

Ok, I was focused on the "unknown attributes" part. That's logged. I have another thread on the forum here about "page gets extract as image, not text", that's also a known bug. That would be a _separate_ issue. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources