Skip to main content
Participant
November 8, 2020
Question

how to solve problem with these (cid: 12) when copying pdf text

  • November 8, 2020
  • 2 replies
  • 2203 views

I'm having trouble extracting text from PDF files. These files are proofs that I am using for my master's research and I really need to extract the text, however when I extract (copy), the characters are strange and a tool that I use (PDFminer) presents several "(cid: 12)"
Can you help me or pass on the contact to someone who can?
I really need it, because this problem is delaying my research.

This topic has been closed for replies.

2 replies

Legend
November 8, 2020

Some PDF files are made in a way that text cannot be extracted, or it extracts wrong. This is not a fault in the extraction software. You will need to review and possibly retype your files. 

radzmar
Community Expert
Community Expert
November 8, 2020

From what I can read under https://pypi.org/project/pdfminer/ PDFminer is a stand-alone command line tool. In what way Adobe Acrobat Reader is involved in this? 

Participant
November 8, 2020

see these proofs of the link, specifically from the year 2017. If you copy the textual content strange characters appear, PDFMiner shows the (cid :). I would need to know if you can copy the textual content of these tests (from the year 2017). I am extracting the textual content from these tests, which I did not create.