Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Missing hyphen when copying URL address from a PDF

Contributor ,
Aug 08, 2020 Aug 08, 2020

Hi,

 

We are facing an issue with missing hyphen when copying an URLaddress from a PDF.

For example, if a PDF has URL addresses like https://jfsdigital.org/articles-and-essays/vol-23-no-4- june-2019/introduction-to-the-special-issue-... (Hyphen is displayed correctly)

 

When we open this PDF with Adobe Acrobat Reader or Professional and copy the URL address and paste it in notepad, the hyphen is missing for example https://jfsdigital.org/articles-and-essays/vol-23-no-4-june-2019/introduction-to-the-special-issue-d...  (Hyphen gets lost)

 

Source file is InDesign.. Is this a known issue? How can we fix it?

 

Sharing Screenshot shots for references:

 

Hyphen2.PNGexpand image

 

Best regards

Santhosh

TOPICS
Create PDFs , Edit and convert PDFs
2.7K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 08, 2020 Aug 08, 2020

The application that created this file set it up incorrectly, so that hyphen is not copied over (it's treated like a hyphen that's added to a word that is split over two lines).

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 08, 2020 Aug 08, 2020

I don't think this has anything to do with how the file was created in InDesign.

 

From looking at the screenshot I assume that is scanned document? 

 

If it is a scanned document, it is my opinion that seems to be  more related to  how the OCR technology was employed or handled by the producing software; that is not necessarily a problem that needs to be fixed; to be honest it has never been fixed. 

 

The short answer to this long thread is that instead of right-clicking on a link in that PDF and selecting "Copy" from the context menu, use "Copy with Formatting" and you should be good to go.... Give it a try.

 

If this method doesn't work and you're using MS Windows 10, disable the setting shown below in the slide and try copying with formatting next time:

 

unicode-locale.pngexpand image

 

What appears to our naked eye as text is basically binary code that was embedded as a layer on top of OCR'ed compressed image layers and treated entirely differently than the rendered text. So you won't be able to copy and paste text using the regular method of copy and paste.

 

If you take a closer look at this part in your screenshot :   -and-futures  instead of -andfutures-

note  that in   -and-futures    the hyphen and the   "-f"   could've been confused as a ligature due to the fact  that, depending on the type of font and the language that the OCR interpreter used to produced the scanned document, the hyphen "-" followed by the lower case letter "f" could've come up as colliding charachters; thus, a ligature was formed as a "failsafe" mechanism  from  the Unicode not being able map (encode/decode) those characters appropriately.

 

The same issue occurs with the OCR recognition software in Acrobat. 

 

In my case, what I did to test the addresses that you posted here, I copied the  URL  provided in your link and pasted it in my web browser.

 

To my surprise this is the result from your first link :

 

  •  https:|/|/jfsdigital.org/articles-and-essays/vol-23-no-4-%20june-2019/introduction-to-the-special-issue-design-and-%20futures-vol-ii/

 

Note the  %20 that was generated.  The percent encoding followed by  "20" (generated by the URI producing application) denotes a space.

 

When in doubt, you can test Text to Unicode conversions yourself by going online  here: https://www.branah.com/unicode-converter

 

You can also hit F12 key in your keyboard and open the web browser's developer's mode to inspect the elements on this page. 

 

See slide below:

 

URI problem.pngexpand image

 

When I copied the first link that you provide that's how it pasted in my Notepad.

 

On the other flipside of the coin, I was also surprised to see that your second link for the same address was not URI produced in the same way. 

 

Without getting too technical this time, I just hovered the mouse pointer over the second hyperlink in your post and it was encoded appropriately. See next slide:

 

 

URI problem2.pngexpand image

 

 

 So, as you can see the problem seems to be in how the URI producing applications (Acrobat, the operating system, the web browsers, for example) interact with the Unicode base that is in use.

 

This sometimes works and in other occasions it doesn't.

 

One quick fix that a lot of users don't seem to agree here in the forums, is just better if you just export the PDF document to MS Word (.docx ) file. Open it in Word and run the Accessibility Checker to spot additional issues before exporting to PDF.  When you're done anyalyzing and fixing the .docx file,  open it directly  in Acrobat and you should not experience this issue.

 

If this was helpful and you found a solution with any of the guidance provided here, please don't forget to mark the answer as solution.

 

Thank you.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Aug 10, 2020 Aug 10, 2020

Hi,

 

Thanks for your detailed analysis. But this is a Normal editable PDF not a scanned Page PDF. Please let me know what is the issue here?

 

Thanks,

Santhosh

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 10, 2020 Aug 10, 2020

Can you share the actual file with us?

You can attach it to the original message using the tiny paperclip icon at the bottom when you edit it, or upload it to a file-sharing website (like Dropbox, Google Drive, Adobe Cloud, etc.), generate a share link and then post it here.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 10, 2020 Aug 10, 2020

Well, that was not an anaylisis.

 

Those were my findings based on what you posted and you haven't confirmed important questions in order to help you.

 

The assumption I made about a scanned document is because in your screenshot it does look like a scanned document and you didn't specified until I asked.

 

Now that you clarified, did you at least tried any of the suggestions?

 

Are you still getting the same problem if you use "Copy with Formatting"?

 

That is really not that hard to try and report back.

 

You also didnt't confirmed in which operating system is this occuring.

 

That is why I also took the time to show in my first slide a setting in MS Windows 10 that interferes with the Unicode base when it is enabled for UTF 8.

 

You haven't been even able to explain why, in your first hyperkink above, the URL was percent encoded incorrectly and throws a space.

 

The second hyperlink is not percent encoded and is just a normal and human-readable text string encoded URL.

 

So that was on your end, wherever you copied those URLs from.

 

It seems like it is more than just Acrobat involved in the mix.

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 10, 2020 Aug 10, 2020

I forgot to add that if your Acrobat update version is earlier than 20.009.20067 you need to update.

 

There was a bug that affected the WebLink creation tool (that is accessed via in Edit PDF)in  Acrobat Pro DC when working with editable PDFs.

 

The release notes for the update that addressed this issue on MS Windows (notmacOS) is found here 20.009.20067 Optional update, June 02, 2020 :

 

4302871: Win: ‘#’ Character in the web link was changing to “%23” when user clicks on the link to open it in web browser

 

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 10, 2020 Aug 10, 2020
LATEST
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines