• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers

Discrepancy between Adobe Extract PDF and the PDF content

New Here ,
Oct 18, 2022 Oct 18, 2022

Copy link to clipboard

Copied

I am a newbie in PDF and I would appreciate having  more explanation about how the Adobe SDK API works through the Adobe sample  extractPDF using the class ExtractTextInfoFromPDF.java.

 

I have a source PDF that contains this definition:

 

7 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Helvetica
/Encoding /WinAnsiEncoding
>>
endobj

Including the following text sequence:

BT
3 Tr
0.00 Tc
/F3 10.5 Tf
1 0 0 1 302.16 776.64 Tm
<i,/ILLENEUVE > Tj
ET

 

And when I run the extractPDF sample via the Adobe API to get the TEXT info, I get this:

"Font": {
				"alt_family_name": "* Titlingmes New Roman",
				"embedded": true,
				"encoding": "Identity-H",
				"family_name": "* Titlingmes New Roman",
				"font_type": "CIDFontType0",
				"italic": false,
				"monospaced": false,
				"name": "*Times New Roman-Bold-3921",
				"subset": false,
				"weight": 700
			},
			"HasClip": false,
			"Lang": "fr",
			"Page": 0,
			"Path": "//Document/Sect/P",
			"Text": "VILLENEUVE ",
			"TextSize": 10.0

As you can see, the API has correctly translated "i,/" (3 characters, unless '/' in this sequence has a special meaning) into the "V"  character ?

The PDF  has been generated using a CANON scanner with OCR/Tagging as the search capabilty is available ont this document, except when searching for "VILLENEUVE".

It must be noted that when opening the PDF for display, the "V" letter  is not clearly displayed ...

Can someone explain me the mystery  (TEXT correctly extracted using the Adobe ExtractPDF API) ?

Thanks, Eric

 

 

Views

85

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 21, 2022 Oct 21, 2022

Copy link to clipboard

Copied

I assume  that if the V Letter is not clearly displayed, it is because the PDF Showing process is doing some  extra translation/conversion.

Thanks ...

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 21, 2022 Oct 21, 2022

Copy link to clipboard

Copied

MOVED TO THE ACROBAT SDK DISCUSSIONS

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 21, 2022 Oct 21, 2022

Copy link to clipboard

Copied

Unfortunately, this is now still in the wrong forum. The Extract SDK is a web based service, not connected to the Acrobat SDK. Correct forum: Document Services APIs - Adobe Support Community

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 23, 2022 Nov 23, 2022

Copy link to clipboard

Copied

Thanks, any link to the Extract SDK ?

 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 23, 2022 Nov 23, 2022

Copy link to clipboard

Copied

OK got it ... on the PDF Extract API page ... but confused by the fact that it is part of the PDF services API ...

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 23, 2022 Nov 23, 2022

Copy link to clipboard

Copied

LATEST

It's a service, not an app, so this seems the right home for it. Files are sent to Adobe's servers for conversion. The Acrobat SDK does not use the web, but uses a local subscription to Adobe Acrobat. Adobe actually have three different teams working on PDF extraction (or maybe four if you include Adobe PDF Library).

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines