Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Making a PDF Searchable?

New Here ,
Nov 23, 2020 Nov 23, 2020

I use Adobe Acrobat DC Pro on. my mac --- if I get a 200/300 page document that I want to make searchable is that possible? 

TOPICS
Edit and convert PDFs , Scan documents and OCR
1.6K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 23, 2020 Nov 23, 2020

I'm sorry, what do you mean by 200/300. Is that the scanning dpi?

 

If so than yes, but the higher the number the better the end result. 600 dpi is maximum but only if it's optical resolution. If it's digital dpi, it's meaningless. Go to the max of your scanner.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 24, 2020 Nov 24, 2020

The 200/300 was just the page count of a document -- I was saying it's a large file, so I can't manually look through it. I'm trying to make it serachable, Thanks for your reply!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 24, 2020 Nov 24, 2020

Hi Blumbizzle,

 

I think I need to take this from the top, there's a lot of cross information here and some of it has confused me. I apologize. The information from Amal only applies if you were using the Edit function, not if you were simply OCRing the entire document. I think you want the entire document OCRed, correct?

 

Has this file been scanned yet? If yes, what resolution was it scanned in (if you know)?

 

If not, what kind of scanner do you  have and do you expect to have this many pages to scan on a regular basis?

 

The reason I ask is that that is a lot of pages to have to scan with a flatbed scanner (I know, I've done it) but if you do this on a regular basis, than investing in a bulk scanner is worth it in term of time spent (I know, I've used the FujiScan with great results). Be aware though that a bulk scanner demands that if you are scanning a book or magazine(s), you MUST destroy the item becuase you cannot feed a whole magazine or a whole book through a bulk scanner. Also be aware that if you are doing a book, the bent text toward the spine of the book makes quality OCR essentially impossible. 

 

So please, as if we are starting from the beginning here, what do you have to work with? I look forward to hearing from you.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 24, 2020 Nov 24, 2020

Thanks for your thorough response. I'm actually not scanning these; these are PDFs that are sent to me through work. I receive emails of PDFs that I want to search -- I'm not editing them. I just want to be able to find keywords and search. I know there's a feature to do this on Adobe, but I'm not sure how perfect or imperfect it is. I'm trying to find the best method to make my PDFs searchable. Thanks for your replies. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 24, 2020 Nov 24, 2020
LATEST

Hi Blumbizzle,

 

OK, all this is important to know.

 

I'm sure you can appreciate this but what you get out of something depends a LOT on what went into it. So if they were scanned the best ways, than your OCR will be more accurate than if they were scanned less than the best ways. The big thing to be concerned about is the resolution, or rather the ppi of the scan. As I said in the beginning higher resolution will give you a better result than a low resolution.

 

Here's an example of why this is important: if you take the letter combination of "ri". The lower the resolution the greater the chance that that will be read as "n". Besides the resolution, one other thing that can affect the OCR quality is the letter/font size. A smaller font is more likely to have errant characters than larger font sizes.

 

Plus you have to add any artifacts on the page that were scanned with the original (that is unless the PDF was made from a digital original).

 

All this is to say that the results you get are dependant upon a significant number of things that are beyond your control and beyond Acrobat's ability. I can say that Acrobat's OCR capability are pretty good but it can only do so much.

 

So here's how to do this: first, select the Scan & OCR Tool (if it's not on the left hand toolbar, it will be in the Tools section).

2020-11-24_17-26-31.png

Then, on the top region you'll see "Recognize Text" and then from the dropdown, select how you need to proceed:

2020-11-24_17-26-47.png

 

Let me know if this makes sense and how it works out.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Nov 23, 2020 Nov 23, 2020

Hi Blumbizzle 

 

Hope you are doing well and sorry for the trouble. As descibed, you want to make a document searchable.

 

++ Adding to the discussion

 

Are you trying to make a scanned document editable/searchable? If yes, Acrobat can easily turn your scanned documents into editable PDFs. When you open a scanned document for editing, Acrobat automatically runs OCR (optical character recognition) in the background and converts the document into editable image and text with correctly recognized fonts in the document. Also, a prompt on upper-right corner appears showing you the recognized OCR language. It also points you to the settings button if you want to change the OCR language.

By default, only the current page is converted to editable text instead of the entire document in one go. As you move from one page to another, the page in focus is made editable.

 

For more information please look at the help page https://helpx.adobe.com/in/acrobat/using/edit-scanned-pdfs.html

 

Hope this information will help

 

Regards

Amal

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines