Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
0

how to solve problem with these (cid: 12) when copying pdf text

New Here ,
Nov 07, 2020 Nov 07, 2020

Copy link to clipboard

Copied

I'm having trouble extracting text from PDF files. These files are proofs that I am using for my master's research and I really need to extract the text, however when I extract (copy), the characters are strange and a tool that I use (PDFminer) presents several "(cid: 12)"
Can you help me or pass on the contact to someone who can?
I really need it, because this problem is delaying my research.

TOPICS
Edit and convert PDFs , PDF forms

Views

1.8K
Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 07, 2020 Nov 07, 2020

Copy link to clipboard

Copied

From what I can read under https://pypi.org/project/pdfminer/ PDFminer is a stand-alone command line tool. In what way Adobe Acrobat Reader is involved in this? 

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 08, 2020 Nov 08, 2020

Copy link to clipboard

Copied

see these proofs of the link, specifically from the year 2017. If you copy the textual content strange characters appear, PDFMiner shows the (cid :). I would need to know if you can copy the textual content of these tests (from the year 2017). I am extracting the textual content from these tests, which I did not create.

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 08, 2020 Nov 08, 2020

Copy link to clipboard

Copied

LATEST

Some PDF files are made in a way that text cannot be extracted, or it extracts wrong. This is not a fault in the extraction software. You will need to review and possibly retype your files. 

Votes

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines