Missing hyphen when copying URL address from a PDF
Copy link to clipboard
Copied
Hi,
We are facing an issue with missing hyphen when copying an URLaddress from a PDF.
For example, if a PDF has URL addresses like https://jfsdigital.org/articles-and-essays/vol-23-no-4- june-2019/introduction-to-the-special-issue-... (Hyphen is displayed correctly)
When we open this PDF with Adobe Acrobat Reader or Professional and copy the URL address and paste it in notepad, the hyphen is missing for example https://jfsdigital.org/articles-and-essays/vol-23-no-4-june-2019/introduction-to-the-special-issue-d... (Hyphen gets lost)
Source file is InDesign.. Is this a known issue? How can we fix it?
Sharing Screenshot shots for references:
Best regards
Santhosh
Copy link to clipboard
Copied
The application that created this file set it up incorrectly, so that hyphen is not copied over (it's treated like a hyphen that's added to a word that is split over two lines).
Copy link to clipboard
Copied
I don't think this has anything to do with how the file was created in InDesign.
From looking at the screenshot I assume that is scanned document?
If it is a scanned document, it is my opinion that seems to be more related to how the OCR technology was employed or handled by the producing software; that is not necessarily a problem that needs to be fixed; to be honest it has never been fixed.
The short answer to this long thread is that instead of right-clicking on a link in that PDF and selecting "Copy" from the context menu, use "Copy with Formatting" and you should be good to go.... Give it a try.
If this method doesn't work and you're using MS Windows 10, disable the setting shown below in the slide and try copying with formatting next time:
What appears to our naked eye as text is basically binary code that was embedded as a layer on top of OCR'ed compressed image layers and treated entirely differently than the rendered text. So you won't be able to copy and paste text using the regular method of copy and paste.
If you take a closer look at this part in your screenshot : -and-futures instead of -andfutures-
note that in -and-futures the hyphen and the "-f" could've been confused as a ligature due to the fact that, depending on the type of font and the language that the OCR interpreter used to produced the scanned document, the hyphen "-" followed by the lower case letter "f" could've come up as colliding charachters; thus, a ligature was formed as a "failsafe" mechanism from the Unicode not being able map (encode/decode) those characters appropriately.
The same issue occurs with the OCR recognition software in Acrobat.
In my case, what I did to test the addresses that you posted here, I copied the URL provided in your link and pasted it in my web browser.
To my surprise this is the result from your first link :
- https:|/|/jfsdigital.org/articles-and-essays/vol-23-no-4-%20june-2019/introduction-to-the-special-issue-design-and-%20futures-vol-ii/
Note the %20 that was generated. The percent encoding followed by "20" (generated by the URI producing application) denotes a space.
When in doubt, you can test Text to Unicode conversions yourself by going online here: https://www.branah.com/unicode-converter
You can also hit F12 key in your keyboard and open the web browser's developer's mode to inspect the elements on this page.
See slide below:
When I copied the first link that you provide that's how it pasted in my Notepad.
On the other flipside of the coin, I was also surprised to see that your second link for the same address was not URI produced in the same way.
Without getting too technical this time, I just hovered the mouse pointer over the second hyperlink in your post and it was encoded appropriately. See next slide:
So, as you can see the problem seems to be in how the URI producing applications (Acrobat, the operating system, the web browsers, for example) interact with the Unicode base that is in use.
This sometimes works and in other occasions it doesn't.
One quick fix that a lot of users don't seem to agree here in the forums, is just better if you just export the PDF document to MS Word (.docx ) file. Open it in Word and run the Accessibility Checker to spot additional issues before exporting to PDF. When you're done anyalyzing and fixing the .docx file, open it directly in Acrobat and you should not experience this issue.
If this was helpful and you found a solution with any of the guidance provided here, please don't forget to mark the answer as solution.
Thank you.
Copy link to clipboard
Copied
Hi,
Thanks for your detailed analysis. But this is a Normal editable PDF not a scanned Page PDF. Please let me know what is the issue here?
Thanks,
Santhosh
Copy link to clipboard
Copied
Can you share the actual file with us?
You can attach it to the original message using the tiny paperclip icon at the bottom when you edit it, or upload it to a file-sharing website (like Dropbox, Google Drive, Adobe Cloud, etc.), generate a share link and then post it here.
Copy link to clipboard
Copied
Well, that was not an anaylisis.
Those were my findings based on what you posted and you haven't confirmed important questions in order to help you.
The assumption I made about a scanned document is because in your screenshot it does look like a scanned document and you didn't specified until I asked.
Now that you clarified, did you at least tried any of the suggestions?
Are you still getting the same problem if you use "Copy with Formatting"?
That is really not that hard to try and report back.
You also didnt't confirmed in which operating system is this occuring.
That is why I also took the time to show in my first slide a setting in MS Windows 10 that interferes with the Unicode base when it is enabled for UTF 8.
You haven't been even able to explain why, in your first hyperkink above, the URL was percent encoded incorrectly and throws a space.
The second hyperlink is not percent encoded and is just a normal and human-readable text string encoded URL.
So that was on your end, wherever you copied those URLs from.
It seems like it is more than just Acrobat involved in the mix.
Copy link to clipboard
Copied
I forgot to add that if your Acrobat update version is earlier than 20.009.20067 you need to update.
There was a bug that affected the WebLink creation tool (that is accessed via in Edit PDF)in Acrobat Pro DC when working with editable PDFs.
The release notes for the update that addressed this issue on MS Windows (notmacOS) is found here 20.009.20067 Optional update, June 02, 2020 :
4302871: Win: ‘#’ Character in the web link was changing to “%23” when user clicks on the link to open it in web browser
Copy link to clipboard
Copied
Here's another tip from a recent discussion just now: https://community.adobe.com/t5/acrobat/how-to-reset-acrobat-2017-pro-without-having-to-re-install-it...

