Service to detect if a PDF is scanned?
Hi,
Is there a service that will detect if a PDF is scanned? I'd like to determine if a PDF is text-based or a scanned image before OCRing it. I don't see what I'm looking for in PDFProperities.
Thanks,
Jeff
Hi,
Is there a service that will detect if a PDF is scanned? I'd like to determine if a PDF is text-based or a scanned image before OCRing it. I don't see what I'm looking for in PDFProperities.
Thanks,
Jeff
In the output from PDF Properties API, look in the "pages" property. For each page you'll see something like the code below.
Be sure to verify the "is_scanned" boolean by checking if the file has only one image and "only_images" is true. If the file has been OCRed, "has_text" will be true.
{
"page_number": 0,
"is_scanned": true,
"width": 630,
"has_structure": false,
"content": {
"number_of_images": 1,
"only_images": true,
"has_text": false,
"has_images": true,
"is_empty": false
},
"height": 810
}Already have an account? Login
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.