• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Recognize Text not working well with CAD drawings in PDF

Community Beginner ,
Jul 15, 2022 Jul 15, 2022

Copy link to clipboard

Copied

I have some PDFs with CAD drawings in them and would like to be able to seach the text.

Searchable Image - Sort of works but a lot of smaller text enevnthough you can zoom in and the text looks great as its vector.

I think ClearScan or Searchable image may be better but 600dpi is not enough for the size of the drawings so they find almost no text and the drawings then look really bad.

 

Befor scan.

Richard252646313q5p_0-1657904709035.png

After ClearScan at 600dpi.

Richard252646313q5p_3-1657904854908.png

 

 

 

Is there a way to just turn the PDF with the CAD drawings in bitmaps at a much higher PDI and then run the Recognize Text?

TOPICS
Edit and convert PDFs , How to

Views

1.6K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 15, 2022 Jul 15, 2022

Copy link to clipboard

Copied

Do you have access to the software that generated the CAD?

 

What format is the CAD image currently in?

 

Thanks

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 15, 2022 Jul 15, 2022

Copy link to clipboard

Copied

I don't know what it is in the PDF but there were DWG files so AutoCAD files maybe? We just have the PDFs and the text is vector but is made up of more that one object.

Tanquen_0-1657910057925.png

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 15, 2022 Jul 15, 2022

Copy link to clipboard

Copied

HI Tanquen,

 

Not knowing what I'm looking at here, exactly what am I looking at. Also, if this is the PDF, I would not call that a vector object; it looks like it has been bitmapped.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 15, 2022 Jul 15, 2022

Copy link to clipboard

Copied

The pic is from the PDF using the edit object option in Acrobat. It's not a bitmap but a vector. When it converted the DWG CAD file into a PDF it did not make it a bitmap. The letters are not single objects though, it's some kind of odd converstion. In AutoCAD you can work with text as shapes and that is kind of what this looks like.

 

So an N is like this:

Tanquen_0-1657922171930.png

Then is Acrobat I can select a part of it and move it.

Tanquen_1-1657922217286.png

and if you look at the OP, ClearScan will convert it to a bitmap and then do OCR but maxed at 600dpi and it is not enought, it cant make out the letters correctly. 

 

I know the file would be large but if ClearScan let me select 1200dpi it would then be abel to make out all the letters when using OCR.

 

Searchable Image was able to get this:

G-1960

PEDS TANK AGITATOR

n'PE: PBT

MOC: 318L 55

MOTOR: 0.5HP/460/3P/6<JHZ

 

out of this:

Tanquen_2-1657922841755.png

Tanquen_3-1657922887753.png

 

But only this

U-1961

out of this

Tanquen_4-1657922937368.png

Tanquen_5-1657922946150.png

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 15, 2022 Jul 15, 2022

Copy link to clipboard

Copied

Hi Tanquen,

 

Here's where I'm having some issues with what you're describing: After processing the text, one cannot move sections of a letter around. What that means is that Acrobat has saved the font as an image, NOT a font.

 

I'm not sure how big the fonts you are dealing with are. If you were to print the page out, about how big (or small) are we talking about?

 

Also, just what version of Acrobat are you using? Currently, what was called Clearscan is now known as "Editable text and images." 

 

Below are the three options available in the current version of Acrobat Pro. Have you tried using either of the other two options?

 

Searchable Image

Ensures that text is searchable and selectable. This option keeps the original image, deskews it as needed, and places an invisible text layer over it. The selection for Downsample Images in this same dialog box determines whether the image is downsampled and to what extent. Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").

Searchable Image (Exact)

Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it. Recommended for cases requiring maximum fidelity to the original image. Typically this is what a FedGov agency requires if submitting a scanned image of text.

Editable Text & Images (what once was Clearscan)

Synthesizes a new custom font that closely approximates the original and preserves the page background using a low-resolution copy.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 15, 2022 Jul 15, 2022

Copy link to clipboard

Copied

"Here's where I'm having some issues with what you're describing: After processing the text, one cannot move sections of a letter around. "

You can if its a drawing with vector objects and you use Searchable Image (Exact). " This option keeps the original image" and vector objects apparently.

I'm guessing it creates a bitmap, runs the OCR and then discards the bitmap and leaves everything as is and only adds the hidden searchable text. But again the DPI is too low for the bitmap it creates and it misses most of the text.

 

I only have Acrobat X Pro. v10.1.16 😞

 

Correct, the text is not a font or an image/bitmap it's a vector, many of them. 

 

I have done this one as it keeps the original vector drawing.

"Searchable Image (Exact)

Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it."

 

As I listed in the OP, I did try the others but again 600dpi is not enough so the OCR misses most of the text. I need a higher setting of if I could convert the vector drawing in the PDF to 1200-ish DPI bitmaps, I'm sure the OCR would work better. 

 

"I'm not sure how big the fonts you are dealing with are. If you were to print the page out, about how big (or small) are we talking about?"

Not sure, back in the day they would normally be on 11x17 paper and even then the text would be on the small side.

Tanquen_0-1657927643017.png

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 01, 2023 Aug 01, 2023

Copy link to clipboard

Copied

I have the same problem. I enhanced the image in Photoshop, but the OCR does not recognize the -apparantly- random text in an autocad P&ID like yours.
Some text is indeed converted, other text not. Especially with P&ID this would be a real help if it could recognize anywhere on the page. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Oct 27, 2023 Oct 27, 2023

Copy link to clipboard

Copied

LATEST

Unfortunetly there seems to be no fix for this in Acrobat. When it conversts the drawing to a bitmap to then do the OCR the dpi is too low and it can not read most of the text. Maybe there is a way to overide the dpi used in the Acrobat Recognize Text OCR function.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines