Extract text with Vb.net
Hello. I have attempted many ways on my own to extract text directly from a pdf but the issue I am having is this.
I have a large pdf file that is a combination of many forms totaling 900 pages. Essentially what I am doing is breaking that file every 5 pages to create 180 smaller files and that works fine. The challenge I have is that in order to name the file once extracted; I need the last name, first name and middle initial from page 1. Unfortunately the code I have which works very well for the first file; doesn't for any subsequent files. It seems the x, y coordinates of the fields in question vary from one file to the next. There is no rhyme or reason to the coordinates that I can see so I cannot even create an algorithm for them.
Is there a way to remove all formatting and images and everything from a PDF so I can guarantee my get is fetching the right text from a consistent (x,y) coordinate?
