Text Extraction from Native PDF
Copy link to clipboard
Copied
Hi,
Adobe PDF version may keep varying depends on the customer..
Hardware - VM Server with good configuration
Requirement / Scenario (PDF may be application form based or native text which is generated by the system)
- Tabular data without border (border less)
- Checkbox data in all different forms
- special characters
- Text inside the Image
- Free text (like signature)
Solution Tried:
We are using C# code through which we have tried to read the data from the PDF for the above scenario and didn't get the expected output.
we used ITextSharper library and used the same from C# to extract the content.
Any other library or suggestion would be really helpful.
Copy link to clipboard
Copied
Hi Mohana,
Even I am looking for a solution for the same kind of problem.
I have to read text from image in a pdf document.
What was the solution you went ahead with ? iTextSharp ?
Do we have a solution for this in the Adobe PDF Library? I have downloaded the sample sdk but do not find any method which can do this. We have DocToImages method, what I want is something like ImagesToDoc feature.
Copy link to clipboard
Copied
We have a lot of experience extracting text using the Adobe PDF Library Please contact me if you need additional help.
Copy link to clipboard
Copied
Can you provide you contact details, I am also looking for extracting text from pdf by removing header/footer and images
Copy link to clipboard
Copied
Michael Peters
Copy link to clipboard
Copied
Michael I have already sent email on this 4 weeks back but haven't received any email from your side on this.
Copy link to clipboard
Copied
Sorry - found it in my spam folder and have now replied.

