• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Service to detect if a PDF is scanned?

Explorer ,
Nov 11, 2024 Nov 11, 2024

Copy link to clipboard

Copied

Hi,

 

Is there a service that will detect if a PDF is scanned?  I'd like to determine if a PDF is text-based or a scanned image before OCRing it.   I don't see what I'm looking for in PDFProperities.

 

Thanks,

 

Jeff

Views

103

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Nov 13, 2024 Nov 13, 2024

In the output from PDF Properties API, look in the "pages" property. For each page you'll see something like the code below.

Be sure to verify the "is_scanned" boolean by checking if the file has only one image and "only_images" is true. If the file has been OCRed, "has_text" will be true. 

 

{
    "page_number": 0,
    "is_scanned": true,
    "width": 630,
    "has_structure": false,
    "content": {
        "number_of_images": 1,
        "only_images": true,
        "has_text": false,
       
...

Votes

Translate

Translate
Community Expert ,
Nov 13, 2024 Nov 13, 2024

Copy link to clipboard

Copied

In the output from PDF Properties API, look in the "pages" property. For each page you'll see something like the code below.

Be sure to verify the "is_scanned" boolean by checking if the file has only one image and "only_images" is true. If the file has been OCRed, "has_text" will be true. 

 

{
    "page_number": 0,
    "is_scanned": true,
    "width": 630,
    "has_structure": false,
    "content": {
        "number_of_images": 1,
        "only_images": true,
        "has_text": false,
        "has_images": true,
        "is_empty": false
    },
    "height": 810
}

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 13, 2024 Nov 13, 2024

Copy link to clipboard

Copied

LATEST

Great thanks Joel.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources