Skip to main content
Participant
October 19, 2017
Question

Extract text with Vb.net

  • October 19, 2017
  • 0 replies
  • 798 views

Hello.  I have attempted many ways on my own to extract text directly from a pdf but the issue I am having is this.

I have a large pdf file that is a combination of many forms totaling 900 pages.  Essentially what I am doing is breaking that file every 5 pages to create 180 smaller files and that works fine.  The challenge I have is that in order to name the file once extracted; I need the last name, first name and middle initial from page 1.  Unfortunately the code I have which works very well for the first file; doesn't for any subsequent files.  It seems the x, y coordinates of the fields in question vary from one file to the next.  There is no rhyme or reason to the coordinates that I can see so I cannot even create an algorithm for them.

Is there a way to remove all formatting and images and everything from a PDF so I can guarantee my get is fetching the right text from a consistent (x,y) coordinate?

This topic has been closed for replies.