• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Copying the tilde ( ~ ) character

Explorer ,
Nov 07, 2019 Nov 07, 2019

Copy link to clipboard

Copied

If I copy some text in a PDF file which has the tilde character in it, and then paste that text into a text editor such as JEdit or Notepad, the tilde character is pasted as 0x98, and not 0x7f. Is this a know bug in Adobe Acrobat ot am I doing something wrong ? I am using Adobe Acrobat Reader DC version 19.021.20048.27333 in Windows 10.

TOPICS
Edit and convert PDFs

Views

2.2K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 08, 2019 Nov 08, 2019

Copy link to clipboard

Copied

It's probably a bug in the PDF. A PDF can have the right character on screen but still contain the wrong text for copying. PDF text extraction is an uncertain and rather random thing, it's not the simple alternative to TXT that many people wish it to be.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 08, 2019 Nov 08, 2019

Copy link to clipboard

Copied

Is there a way to view the character's actual hexadecimal code in Acrobat ? I do not see this functionality when I view the PDF.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 08, 2019 Nov 08, 2019

Copy link to clipboard

Copied

No. Characters don't have a hex code as you are thinking, but text extraction generates a Unicode point using all sorts of info inside and outside the PDF.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 08, 2019 Nov 08, 2019

Copy link to clipboard

Copied

Well is is certainly not generating the correct ASCII character for the tilde. I also noticed that it is also not generating the right ASCII character for the circumflex ( ^ ) character. I guess copying from a PDF document and pasting into a text editor is simply not reliable in Acrobat. Saying it is the PDF document, where I am definitely seeing a tilde or a circumflex, seems a bit of a copout on Acrobat's part when there is no reliable way of determing what Acrobat is doing when I copy text as far as generating the characters to copy.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 08, 2019 Nov 08, 2019

Copy link to clipboard

Copied

I suggested that this is a fault with the PDF, not with Acrobat. But of course you may have found a bug. Can you share a problem PDF. " Saying it is the PDF document, where I am definitely seeing a tilde or a circumflex, seems a bit of a copout on Acrobat's part" - no, that is the absolute reality of PDFs. Seeing something on screen and extracting text are very different, and a bad PDF may fail on one but not the other. "there is no reliable way of determing what Acrobat is doing when I copy text as far as generating the characters to copy." correct, except to check. Once you have a trustworthy source of good PDFs, you may stop checking.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 09, 2019 Nov 09, 2019

Copy link to clipboard

Copied

I agree with you that this is probably a problem with the PDF files themselves, rather than Acrobat, but it does seem to me that Acrobat should provide some means for the end-user to check the code point of some text to see what it actually is. In my case the text I copied from a PDF file came from a copy of a C++ Standard document in PDF format for an early version of C++ ( 2003 ), that was licensed tro me from the ANSI store so it would be difficult for me to share it and possibly illegal to do so. I do have PDF versions of other later C++ standard version documents up through the latest proposed C++20 standard, many not licensed but freely available, and I am seeing corrections in some later version of example code where the code points for characters appearing as tilde and circumflex when copied are correctly within the 0-127 7-bit ASCII range. So it is very possible that earlier C++ standard documents were corrected when producing later ones.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 10, 2019 Nov 10, 2019

Copy link to clipboard

Copied

PDF files don't have code points any more than they have ASCII characters. The way to check how text will extract is... to extract text. Not sure why any further feature is needed. Unfortuantely, nobody is likely to revisit old standards documents and fix them up, and indeed they may consider that making copy/paste difficult is a desirable aim (I wouldn't agree, but I have met this attitude before).

 

It isn't likely to be a question of correcting the documents, but of using later and fixed tools to make the PDFs.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 10, 2019 Nov 10, 2019

Copy link to clipboard

Copied

There is text which I am seeing in a PDF document. What determines the characters I am seeing ? There must be some underlying representation which determines a character. If I see some characters and one of those characters is a tilde character I am seeing, then I naturally suppose that this character correspnds to the tilde character in the ASCII ( or maybe UTF8 where the ASCII characters form the first 256 ) character set. When I copy the character and paste it to my text editor, and the character turns out not to be the ASCII tilde character, naturally I am a bit upset and curious why that is so. I then suggest that there should be a way that I can view using Acrobat what that character is in the ASCII character set. That's all ! I do not think that is an unreasonable request considering the huge number of programmers who have, and continue to have, an interest in the ASCII character set as the text they use in their programming langauage. In fact I do not know a single programming language, although I suppose there must be some, that does not use the characters in the ASCII character set, and usually the first 128 characters in that set, as the representation of the valid characters to be used in the constructs of that programming language. I realize that Acrobat need not provide the functionality I am asking for, but still I view it as a reasonable request. I am sure many programmers copy characters from PDF documents, and many English speaking people otherwise, and that they would like to be assured that the characters they view correspond to the characters they expect in their text editors. I do realize that good text editors, like the one I usually use as a programmer, can display characters in many different character sets if necessary, including Unicode, but still the ASCII character set remains very popular since it is the set of programing languages. I am aware that ASCII is code page 1252 in my editor, but I think you know what I mean when I say the ASCII character set.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 10, 2019 Nov 10, 2019

Copy link to clipboard

Copied

I understand your natural supposition, and sometimes natural suppositions are wrong. Think of a font in PDF as a collection of pictures. In a "good" PDF the pictures have helpful names, like "A" or "comma". In a "bad PDF" they may have useless names like "G232" and "Char", or even bad names, like "A" for a number 3.

 

PDF files do not use ASCII character sets - it seems you keep asking for something that isn't there in the hope you will find a simple solution to a problem that has been vexing PDF users for 20 years and more. Many solutions are available to the people who MAKE PDF files, but you can't make people do the right thing. I can refer you to 1000 page documents explaining it all, but I don't imagine you'll be interested.

 

You can be outraged if you like, but this is how PDFs are. I have no further explanation to make.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 12, 2019 Nov 12, 2019

Copy link to clipboard

Copied

What you are saying is that when I copy from a PDF file in Windows what is being copied is a stream of data in PDF format to the Windows clipboard. Then when I paste the clipboard data into a text editor the knowledge of PDF format and of translating form the PDF format to to the code page of my text editor is part of Windows functionality. Would this be a correct assessment of how things work ?

 

I am only trying to understand how PDF formatted data which shows in Acrobrat as a tilde character becomes a character which is not a tilde in the code page of my text editor.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 13, 2019 Nov 13, 2019

Copy link to clipboard

Copied

LATEST

No, that is completely wrong, but I'm sorry I have no more energy to find new ways to explain it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines