New detection inclusions in EXTRACT API?

New Here ,
Sep 14, 2022 Sep 14, 2022

Copy link to clipboard

Copied

Hello everyone,

 

I have been using extract api for over a year now, but since past few days I am seeing changes in detection part (illogical) that I have never observed before, 

For example, there can be paragraph or heading directly under the list tree instead of being under a list item, i.e

//Document/L[4]/LI/Lbl
//Document/L[4]/LI/LBody
//Document/L[4]/P
//Document/L[4]/LI[2]/Lbl
//Document/L[4]/LI[2]/LBody
 
Not only here, but also in Table of content items in analogous manner.
There has been no update in recent changelogs regarding any changes, can someone please point out any information regarding this!

My application runs on heuristics which expects list item in a list not directly P or H, so its crucial for me to understand these changes both from logical standpoint and new work need to be put or not

TOPICS
Bug , PDF Extract API , REST APIs

Views

416

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 14, 2022 Sep 14, 2022

Copy link to clipboard

Copied

Also, adding to the question,

Can i change what API version I would like to use, basically following previous Path patterns.

 

Thanks in advance

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Sep 14, 2022 Sep 14, 2022

Copy link to clipboard

Copied

I'm looking into your first post, but to the second, no, you can not use a particular version of the API.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Sep 14, 2022 Sep 14, 2022

Copy link to clipboard

Copied

The experience has been shaky since few days, to give you reference PFA pdf document and its extract api response json.
You can look at object starting at line 1646 in example json, in page number 3 (index 2), list consists only h2 elements.  You can verify it visually not to be a list. Or atleast if its concluded as a list then conform to list having listitems.

 

Thanks for prompt response, hoping to get it resolved soon since it carries lot of implications

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Sep 15, 2022 Sep 15, 2022

Copy link to clipboard

Copied

Update: Engineering has confirmed there as an update recently. They are digging deeper into this particular aspect.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Sep 15, 2022 Sep 15, 2022

Copy link to clipboard

Copied

Thanks for updating,

Awaiting further response! Can share more insights and difficulties faced, if the team wants

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Sep 22, 2022 Sep 22, 2022

Copy link to clipboard

Copied

LATEST

@Raymond Camden , any more updates yet?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

Make content for your business needs with Adobe Express.

Get started easily with free templates: