Skip to main content
Participant
January 13, 2023
Question

Issues when Copy & Pasting from PDF

  • January 13, 2023
  • 1 reply
  • 496 views

Hi!


Some PDF files have this issue: when someone copy & past text from it, the result shows some sets of characters that you can recognise as the graphical elements of a character with diacritics. Example (bolds by me):

 

Os sistemas de transcetores coerentes tradicionais permitem a codifica ̧c ̃ao de informa ̧c ̃ao em ambas quadraturas e em duas polariza ̧c ̃oes ortogonais do campo el ́etrico. Contudo, estes transcetores utilizados atualmente s ̃ao baseados num esquema intradino, que requer dois h ́ıbridos ́oticos de 90o e quatro pares de fotodetetores para sistemas de transmiss ̃ao com polariza ̧c ̃ao dupla, fazendo com que o custo destes sistemas seja pouco atrativo para aplica ̧c ̃oes de curto alcance.

 

I red that this may be caused by the absence of a glyph to unicode mapping, but I'm not shure if that is the actual cause. Why? Because I get this result when using Acrobat Reader and Sumatra in Windows but not with evince in Linux. With this last tool, I get this:

 

Os sistemas de transcetores coerentes tradicionais permitem a codificação de informação em ambas quadraturas e em duas polarizações ortogonais do campo elétrico. Contudo, estes transcetores utilizados atualmente são baseados num esquema intradino, que requer dois hı́bridos óticos de 90 o e quatro pares de fotodetetores para sistemas de transmissão com polarização dupla, fazendo com que o custo destes sistemas seja pouco atrativo para aplicações de curto alcance.

 

So, what exactly does evince does that the other PDF readers don't do?

This topic has been closed for replies.

1 reply

Karl Heinz  Kremer
Community Expert
Community Expert
January 13, 2023

This could still be due to the missing (or wrong) mapping to Unicode characters. We don't know how evince is doing the conversion. They may have some other, heuristic or OCR based algorithm. As far as Adobe's tools are concerned, if the conversion is not done correctly, it is because some information is missing, which means it's a bad PDF file. If you have Adobe Acrobat Pro, you can run a preflight check to look for problems with the PDF file. 

Participant
January 16, 2023

ok.

Are there any free tools (possibly online) for running that preflight check, or a similar corectness check over a PDF file?

BTW, this problem is related with the production of PDF files with LaTeX-based frameworks used by students.