Copy link to clipboard
Copied
I'm having trouble extracting text from PDF files. These files are proofs that I am using for my master's research and I really need to extract the text, however when I extract (copy), the characters are strange and a tool that I use (PDFminer) presents several "(cid: 12)"
Can you help me or pass on the contact to someone who can?
I really need it, because this problem is delaying my research.
Copy link to clipboard
Copied
From what I can read under https://pypi.org/project/pdfminer/ PDFminer is a stand-alone command line tool. In what way Adobe Acrobat Reader is involved in this?
Copy link to clipboard
Copied
see these proofs of the link, specifically from the year 2017. If you copy the textual content strange characters appear, PDFMiner shows the (cid :). I would need to know if you can copy the textual content of these tests (from the year 2017). I am extracting the textual content from these tests, which I did not create.
Copy link to clipboard
Copied
Some PDF files are made in a way that text cannot be extracted, or it extracts wrong. This is not a fault in the extraction software. You will need to review and possibly retype your files.