Skip to main content
Participant
February 25, 2017
Question

Problem to count WORD in PDF file with VBScript

  • February 25, 2017
  • 2 replies
  • 2086 views

Hi Colleagues!

Need to complete the script (test) which count how many times WORD appeared in PDF document.

I am using this script

Option Explicit

Dim accapp, acavdocu

Dim pdf_path, bReset, Wrd_count

pdf_path = "C:\Tips.pdf"

Set accapp = CreateObject( "AcroExch.App" )

accapp.Show()

Set acavdocu = CreateObject( "AcroExch.AVDoc" )

If acavdocu.Open( pdf_path, "" ) Then

  acavdocu.BringToFront()

  bReset = 1 : Wrd_count = 0

   'FindText:Finds the specified text, scrolls so that it is visible, and highlights it

  Do While acavdocu.FindText( "Primary", 1, 1, bReset )

  bReset = 0 : Wrd_count = Wrd_count + 1

  Wait 0, 200

  Loop

End If

.....

the problem is that loop didn't finish. It count words on each page till the end (8 pages) and then started again,

Please tel me how can I count all word which I need and exit from the loop.

Thank you in advance

This topic has been closed for replies.

2 replies

ReinhardF
Participating Frequently
February 27, 2017

You can store the current page with something like:

Set gAVPageView = acavdocu.GetAVPageView

Set gPdPage  = gAVPageView.GetPage

pgn = gPDPage.GetNumber

Save pgn as last page  and if the current page (pgn) is smaller as the last page quit the loop or use this in combination with

PDDoc.GetNumPages().

br. Reinhard

ReinhardF
Participating Frequently
February 27, 2017

Mmmh, I just remembered an old vbs study to find text in a pdf. Perhaps you can use that.

   '//Settings: Filename and Word to find
FileNM = "d:\Test2.pdf"
WordTF = "Hello World"

'// Check if file exist
set fs = CreateObject("Scripting.FileSystemObject")
if not fs.FileExists(FileNM) then
     MsgBox "Ups! " & FileNM & " doesn't exist? " & "Try new!", vbExclamation
     WScript.quit
end if

    '//Start Acrobat and Open the File into View
Set gApp = CreateObject("AcroExch.App")
Set gAVDoc = CreateObject("AcroExch.AVDoc")
OK = gAVDoc.Open(FileNM, "")
        if  not OK Then if MsgBox("Error open Basic File") then Wscript.quit

'//comment both out to work hidden

gApp.show  
gAVDoc.bringToFront()

'// let's go

readAndFindText()  '// 15 sec for 100 pages (10 sec hidden in mode)

function readAndFindText()
  set gPdDoc = gAVDoc.GetPdDoc()
  maxPages = gPdDoc.GetNumPages
  foundOnPage = ""
  Set gAVPageView = gAVDoc.GetAVPageView
  for x = 0 to maxPages -1  '// loop over all the pages
       gAVPageView.goto(x)
       Set PdfPage = gAVPageView.GetPage
       Set PageHL = CreateObject("AcroExch.HiliteList")
       PageHL.Add 0,9000  '<<--SET in FILE! (Start,END[9000=All])
       Set PageSel = PdfPage.CreatePageHilite(PageHL)
       for i = 0 to PageSel.Getnumtext - 1  '//loop to get all Words on current Page

           pdfData = PDFData & PageSel.GetText(i)

       Next
       msgbox(pdfData)
       if instr(pdfData, WordTF) then foundOnPage = foundOnPage &x + 1 &","
       'msgBox("page: " &x &" / " &foundOnPage &vbLF &pdfDATA)
       pdfData = ""

  next
  msgbox("found on Page: " &foundOnPage)
end function

Set gPdPage  = nothing

Set gAVPageView = Nothing

Set gAVDOC = Nothing

Set gAPP = Nothing

Legend
February 26, 2017

FindText isn't designed for word counting, and it may well loop. It's just a shortcut to showing text in the UI. You probably need to extract all words, check each one and count matched. One way is the JavaScript document.getPageNthWord method.