Getting font color in Adobe Extract API's output

Report · Feb 16, 2023

Is the Font color available in Adobe Extract API's output? Could not find it in the response and the schema JSON linked here

Report · Feb 24, 2023

You aren't missing anything. At this time we do not detect or output the color of the text. I've already made this feature request and I know what I'm looking for but I'm curious to hear what you are looking for. Do you want the color of the text drawn by the text operator or did you want the perceived color of the text on the page? For example, if you have some black text drawn on the page but later in the PDF display list, it's covered by a 50% transparent white box. Do you want the color to be reported as black or as gray?

Report · Feb 27, 2023

I've already made this feature request

Good to know that a feature request is already in place.

Do you want the color of the text drawn by the text operator or did you want the perceived color of the text on the page?

PDFs are an interesting file format 🙂. Given that the requirement is to be able to republish the content, having access to the perceived color would help.

Report · Jul 11, 2023

Hi @Joel Geraci

We are trying to achieve what you have done in the Acrobat reader Liquid mode. We have tested with different PDF files and noticed that the texts are shown in their original styling in Liquid mode. But the output doesn't provide font files, colors, etc.

Adobe advertise that Liquid mode is what you can achieve with the Extract API. So the question is how we can achieve that? If there another tools that supposed to used along with the Extract API.

Thanks,

Report · Jul 11, 2023

While Extract API and the Liquid Mode engine share some common code, they aren't the same thing. Unfortunately, at this time, Extract still does not output the color of the text with the styling but I'll ping the product team again to see if I can get the request prioritized. I can't make any promises though.

That said, I am curious to know where you saw the representation that "Liquid mode is what you can achieve with the Extract API" because I need to get that corrected or at least clarified.

Report · Jul 12, 2023

Hi,

We just went back to check some articles/videos we were researching, and we didn't find such a statement. I think that was an assumption because it was adverted that it's so powerful and uses Adobe Sensei AI for extracting the data etc. The Extract API still does an outstanding job, and you want differently achieve extracting accessible content out of PDF (because you don't need styling there).

But we really need to achieve what you did with the Liquid mode. So the question is should we look to other tools (link) and do more research or are you planning to do some changes so you can extract styling from PDF too?

Thanks,