Skip to main content
New Participant
September 13, 2019
Question

Full text catalog indexing is not working

  • September 13, 2019
  • 3 replies
  • 1632 views

Dear Team,

I am using the feature "full text index with catalog" to index my PDF. Everytime, process get aborted and error is shown at different pages.

For example :

Error extracting words from page 29067
Error extracting words from page 29068
Found two consecutive pages with errors. Rest of the file will be skipped

 

---

Error extracting words from page 39139
Error extracting words from page 39140
Found two consecutive pages with errors. Rest of the file will be skipped

 

Likewise everytime error is occurring but at different page numbers.

Total pages: 40784.

 

Please feedback.

    This topic has been closed for replies.

    3 replies

    ls_rbls
    Community Expert
    September 17, 2019

    Hi,

     

    Please refer to the following help link:

    https://helpx.adobe.com/acrobat/using/creating-pdf-indexes.html

     

    And here:

    https://community.adobe.com/t5/Acrobat/Indexing-PDFs/td-p/9222958

     

    Look in the answer provided by Jane-E in the thread above

    ls_rbls
    Community Expert
    September 16, 2019

    Assuming that the index you've created was working before and, that this is the first time that this problem has manifested, here are a few ideas that I can think of:

     

    1) Consider creating a new index definition catalogue of a full-text index search for just the  PDF documents you are extracting information from.

     

    2) OR, redefine the document collections in a local directory (ensuring that all the documents you want to search exist in a single directory of the local hard drive of the computer you are using when the index search is performed on those given  files

     

    3) OR, verify that those files ( or some of the files) that you are working on are not opened by another user, if, let's say,  shared among different users over a network or cloud service  when you are trying to perform the index search).

     

    4) Create a new index catalogue  but this time  paying close attention to what exactly you want to be searched for under "Index Description" ---> "Options" ("Do not include numbers", "Add IDs to Adobe PDF v1.0 files", "Do Not warn for changed documents when searching")

     

    Also, redefining the list of "Custom Properties to be indexed, "XMP fields to be indexed" and "Structure Tags"

     

    5) And if you are running Windows operating system, ensure that when you add entries to "include these directories" and "include these subdirectories" , that you have instructed the operating system to  also index those directories for faster searches.

     

    My main question is, if the error that you are getting could also be related to documents that are locked with security, for example, OR, has been flattened or tampered with in anyway that have altered the original file structure of the file(s) that are identified as giving the errors

    New Participant
    September 17, 2019

    Thanks a lot for your feedback. Plz find my comments:

    1) I am doing like that only. In the given folder, only this PDF is available

    2) Same is ensured

    3) No other one is using. Becauase it 's a local copy

    4) No options are enabled

    5) Taken care accordingly

    6) No password protected.

    My anothe robservation:

    If i split the PDF into two parts as mentioned below:

    Part1- 40783

    Part2- 40784

    Then If i do the inexing for part1, it i sgentting created successfully. But 2 limitations:

    1) PDF size increases from 621MB to 921MB

    2) Bookmarks that are available in the orignal document is lost during files plit.

    ls_rbls
    Community Expert
    September 15, 2019

    Hello,

     

    You've mentioned "extracting"; Do you mean that you are trying to perform a search for specific words to be able to extract redacted comments from a PDF document?

    New Participant
    September 16, 2019
    The above errors are thrown by Adobe Acrbat while trying to perform PDF indexing