• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers

Text search from multipage pdf in a folder and extract file name and page number?

New Here ,
Oct 17, 2022 Oct 17, 2022

Copy link to clipboard

Copied

CONDITION:

I have 100s of pdf files stored in a folder, each pdf contains 100 to 200 pages.

 

WHAT I AM TRYING TO DO:

I want to search a specific word from those pdf files,  example i like to search a  word called "Honesty".

 

WHAT IS THE RESULT I EXPECT

I want to know, in which pdf files and the page number of the file, does this word "Honesty" exists

 

You can also contact me for providing the solution or leave your contact.

 

 

TOPICS
UiPath

Views

60

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Oct 17, 2022 Oct 17, 2022

Copy link to clipboard

Copied

Are you asking if this is possible? Sure - you can use Extract to get the text, and what page # stuff is on, but we don't provide a search engine. You would need to do that yourself. The only concern is that we do cap out the max size you can run the Extract API with - 200 pages. You sound like you'll be inside the limit though.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 17, 2022 Oct 17, 2022

Copy link to clipboard

Copied

Thanks Camden, i also like to know the pdf file name along with page #, because i am spending time opening each pdf file and manually searching the word.

 

As you suggested, you can give us your solution with Adobe, we can try and opt that.  For search engine, you can suggest a tool or a solution.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Oct 18, 2022 Oct 18, 2022

Copy link to clipboard

Copied

LATEST

I've used Algolia (commercial, but with a good free tier) and Lunr (free and open source, but client based, so not good for large datasets unless you use serverless functions).

 

Alogia: https://www.algolia.com/

Lunr: https://lunrjs.com/

 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources