Quitter
  • Communauté internationale
    • Langue:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티

Not able to paste the copied text, text appears as boxes when pasted.

Débutant dans la communauté ,
Jun 18, 2024 Jun 18, 2024

I have this pdf file:data.pdf, I don't know how it was created. When I copy the text from this pdf and paste it somwhere else it shows boxes instead of the text. I think it has to do something with its encoding or font-
ECQMlV+Helvetica (Embedded Subset)
       Type: TrueType (CID)
       Encoding: Identity-H
ZZSOZT+Helvetica-Bold (Embedded Subset)
       Type: TrueType (CID)
        Encoding: Identity-H
Please help me to copy paste the text.

SUJETS
Résolution des problèmes , PDF
4.4K
Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
1 SOLUTION APPROUVÉE
Community Expert ,
Jun 18, 2024 Jun 18, 2024

Yes, it's a font encoding issue. The only real solution is to re-create the file using proper fonts.

To do that export it as (high-quality) images, such as PNG, then create a new PDF file from those images, and run Text Recognition on it.

Voir la solution dans l'envoi d'origine

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
Community Expert ,
Jun 18, 2024 Jun 18, 2024

Yes, it's a font encoding issue. The only real solution is to re-create the file using proper fonts.

To do that export it as (high-quality) images, such as PNG, then create a new PDF file from those images, and run Text Recognition on it.

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
Débutant dans la communauté ,
Jun 18, 2024 Jun 18, 2024

Thanks for the reply, I will do the Text recognition.

Is it possible someone had deliberately created the pdf in this fashion or it is somekind of error? And can I also create pdf's with this problem.

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
Community Expert ,
Jun 18, 2024 Jun 18, 2024

"Is it possible someone had deliberately created the pdf in this fashion or it is somekind of error?"

Neither. A PDF is not meant to be an editable format, it's an output format, and it's also designed to be compressed to make it as small as possible, hence, when a font is subsetted to make the file smaller, it will sometimes renecode it for this new "virtual font" (which is no longer Helvetica at all, but "ECQMlV+Helvetica") because it doesn't need the entire character map, especially the larger one that comes with some TrueType CID fonts. In other cases, it will leave the encoding as is, say, ANSI, because it's a smaller character map.

Is there a way to fix this after the fact? not really, as you can't get the eggs back after the cake is made.

However, some tools, like Pitstop or Markzware's OmniMarks (not trying to promote them, just showing an example), can allow you to remap Characters when you convert a PDF, like so:

Screen Shot 2024-06-18 at 10.39.25 AM.png

 Since your fonts are subsetted, you will only have to remap as many characters that you have used, but this will allow you to copy/export/create a new PDF of the text as a more usable form.

 

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
Community Expert ,
Jun 18, 2024 Jun 18, 2024

I would say both, actually. Some people do it on purpose to "obfuscate" the file (I will not explain how to do it, though, as I consider it quite harmful for the end-user), and some just do it by mistake. This doesn't really prevent editing. It prevents copying the file's text, searching it, extracting information from it, etc.

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
Débutant dans la communauté ,
Jun 18, 2024 Jun 18, 2024

Thanks for the answers, you guys have been really helpful.

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
Community Expert ,
Jun 18, 2024 Jun 18, 2024

There ARE some on-line conversion tools that can successfully reencode fonts on the way to another format (e.g. Word file or even IDML), but this is a buyer beware sort of things, and probably not what you want to do with any sensitive data

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
Nouveau ici ,
Dec 23, 2024 Dec 23, 2024

Nothing suggested here worked for me, but what did work is this:

1. In Acrobat Pro, go to File > Export a PDF > JPEG. This will create individual JPEGs for each page
2. In Finder (Mac) or Explore (PC) , select all of the created JPEGs, right click, open in Acrobat Pro. A prompt will ask if you want to combine the files into one PDF, say YES. Adobe OCR functionality converts the image back to text in a supported font!
3. Edit the newly created PDF Adobe OCR functionality converts the image back to text in a supported font!

4. Copy and paste.

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
Community Expert ,
Dec 23, 2024 Dec 23, 2024

That's exactly what I suggested in my original reply to this question..

 

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines
Nouveau ici ,
Dec 23, 2024 Dec 23, 2024
LA PLUS RÉCENTE

I wasn't finding a "Text Recognition" option, so I thought these specific steps would be helpful. Yes, they are similar to your suggestion. 

Traduire
Signaler
Directives de la communauté
Restez bienveillant et courtois, ne vous attribuez pas la paternité des créations d’autrui et assurez-vous de l’absence de doublons avant de poster du contenu. En savoir plus
community guidelines