I have some PDFs with CAD drawings in them and would like to be able to seach the text.
Searchable Image - Sort of works but a lot of smaller text enevnthough you can zoom in and the text looks great as its vector.
I think ClearScan or Searchable image may be better but 600dpi is not enough for the size of the drawings so they find almost no text and the drawings then look really bad.
After ClearScan at 600dpi.
Is there a way to just turn the PDF with the CAD drawings in bitmaps at a much higher PDI and then run the Recognize Text?
Do you have access to the software that generated the CAD?
What format is the CAD image currently in?
I don't know what it is in the PDF but there were DWG files so AutoCAD files maybe? We just have the PDFs and the text is vector but is made up of more that one object.
Not knowing what I'm looking at here, exactly what am I looking at. Also, if this is the PDF, I would not call that a vector object; it looks like it has been bitmapped.
The pic is from the PDF using the edit object option in Acrobat. It's not a bitmap but a vector. When it converted the DWG CAD file into a PDF it did not make it a bitmap. The letters are not single objects though, it's some kind of odd converstion. In AutoCAD you can work with text as shapes and that is kind of what this looks like.
So an N is like this:
Then is Acrobat I can select a part of it and move it.
and if you look at the OP, ClearScan will convert it to a bitmap and then do OCR but maxed at 600dpi and it is not enought, it cant make out the letters correctly.
I know the file would be large but if ClearScan let me select 1200dpi it would then be abel to make out all the letters when using OCR.
Searchable Image was able to get this:
PEDS TANK AGITATOR
MOC: 318L 55
out of this:
But only this
out of this
Here's where I'm having some issues with what you're describing: After processing the text, one cannot move sections of a letter around. What that means is that Acrobat has saved the font as an image, NOT a font.
I'm not sure how big the fonts you are dealing with are. If you were to print the page out, about how big (or small) are we talking about?
Also, just what version of Acrobat are you using? Currently, what was called Clearscan is now known as "Editable text and images."
Below are the three options available in the current version of Acrobat Pro. Have you tried using either of the other two options?
Ensures that text is searchable and selectable. This option keeps the original image, deskews it as needed, and places an invisible text layer over it. The selection for Downsample Images in this same dialog box determines whether the image is downsampled and to what extent. Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").
Searchable Image (Exact)
Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it. Recommended for cases requiring maximum fidelity to the original image. Typically this is what a FedGov agency requires if submitting a scanned image of text.
Editable Text & Images (what once was Clearscan)
Synthesizes a new custom font that closely approximates the original and preserves the page background using a low-resolution copy.
"Here's where I'm having some issues with what you're describing: After processing the text, one cannot move sections of a letter around. "
You can if its a drawing with vector objects and you use Searchable Image (Exact). " This option keeps the original image" and vector objects apparently.
I'm guessing it creates a bitmap, runs the OCR and then discards the bitmap and leaves everything as is and only adds the hidden searchable text. But again the DPI is too low for the bitmap it creates and it misses most of the text.
I only have Acrobat X Pro. v10.1.16 😞
Correct, the text is not a font or an image/bitmap it's a vector, many of them.
I have done this one as it keeps the original vector drawing.
"Searchable Image (Exact)
Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it."
As I listed in the OP, I did try the others but again 600dpi is not enough so the OCR misses most of the text. I need a higher setting of if I could convert the vector drawing in the PDF to 1200-ish DPI bitmaps, I'm sure the OCR would work better.
"I'm not sure how big the fonts you are dealing with are. If you were to print the page out, about how big (or small) are we talking about?"
Not sure, back in the day they would normally be on 11x17 paper and even then the text would be on the small side.
Unfortunetly there seems to be no fix for this in Acrobat. When it conversts the drawing to a bitmap to then do the OCR the dpi is too low and it can not read most of the text. Maybe there is a way to overide the dpi used in the Acrobat Recognize Text OCR function.