New Participant

Question

Full text catalog indexing is not working

Forum|Forum|6 years ago
September 13, 2019
3 replies
1632 views

Dear Team,

I am using the feature "full text index with catalog" to index my PDF. Everytime, process get aborted and error is shown at different pages.

For example :

Error extracting words from page 29067
Error extracting words from page 29068
Found two consecutive pages with errors. Rest of the file will be skipped

---

Error extracting words from page 39139
Error extracting words from page 39140
Found two consecutive pages with errors. Rest of the file will be skipped

Likewise everytime error is occurring but at different page numbers.

Total pages: 40784.

Please feedback.

This topic has been closed for replies.

ls_rbls

Community Expert

Hi,

Please refer to the following help link:

https://helpx.adobe.com/acrobat/using/creating-pdf-indexes.html

And here:

https://community.adobe.com/t5/Acrobat/Indexing-PDFs/td-p/9222958

Look in the answer provided by Jane-E in the thread above

ls_rbls

Community Expert

Assuming that the index you've created was working before and, that this is the first time that this problem has manifested, here are a few ideas that I can think of:

1) Consider creating a new index definition catalogue of a full-text index search for just the PDF documents you are extracting information from.

2) OR, redefine the document collections in a local directory (ensuring that all the documents you want to search exist in a single directory of the local hard drive of the computer you are using when the index search is performed on those given files

3) OR, verify that those files ( or some of the files) that you are working on are not opened by another user, if, let's say, shared among different users over a network or cloud service when you are trying to perform the index search).

4) Create a new index catalogue but this time paying close attention to what exactly you want to be searched for under "Index Description" ---> "Options" ("Do not include numbers", "Add IDs to Adobe PDF v1.0 files", "Do Not warn for changed documents when searching")

Also, redefining the list of "Custom Properties to be indexed, "XMP fields to be indexed" and "Structure Tags"

5) And if you are running Windows operating system, ensure that when you add entries to "include these directories" and "include these subdirectories" , that you have instructed the operating system to also index those directories for faster searches.

My main question is, if the error that you are getting could also be related to documents that are locked with security, for example, OR, has been flattened or tampered with in anyway that have altered the original file structure of the file(s) that are identified as giving the errors

A

adobeacrobat_indexingAuthor

New Participant

Thanks a lot for your feedback. Plz find my comments:

1) I am doing like that only. In the given folder, only this PDF is available

2) Same is ensured

3) No other one is using. Becauase it 's a local copy

4) No options are enabled

5) Taken care accordingly

6) No password protected.

My anothe robservation:

If i split the PDF into two parts as mentioned below:

Part1- 40783

Part2- 40784

Then If i do the inexing for part1, it i sgentting created successfully. But 2 limitations:

1) PDF size increases from 621MB to 921MB

2) Bookmarks that are available in the orignal document is lost during files plit.

ls_rbls

Community Expert

Hello,

You've mentioned "extracting"; Do you mean that you are trying to perform a search for specific words to be able to extract redacted comments from a PDF document?

A

adobeacrobat_indexingAuthor

New Participant

The above errors are thrown by Adobe Acrbat while trying to perform PDF indexing

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded