Skip to main content
SebastiaoV
Known Participant
May 4, 2009
質問

Copy text from a PDF to word. Just get Symbols

  • May 4, 2009
  • 返信数 9.
  • 469213 ビュー

Hello,

I have a public PDF with no Copying Restrictions. When I try to copy text from the PDF highlighted text to WORD I only get unreadable garbage.
I can select the desired text and copy it into word but when I paste the text it is pasted like symbols and lines.

I tried Special Paste and does not works. It says the font is a Gill Sans something (with numbers and so on), no really a font it seems but when i change it to Arial i still get symbols.

Any help or ideas,

Cheers,
Sebastian

このトピックへの返信は締め切られました。

返信数 9

Participant
November 21, 2020

Print the PDF into PDF. then try converting to word. worked for me.

Participating Frequently
October 4, 2021

In Acrobat Pro delete all font information: Edit > Preflight > pdf fixups > Convert fonts to outlines.

Next, re-recognise all text: Scan and OCR > Recognise text.

Then export as Plain text.

You may get the odd scan error.

Participant
November 21, 2020

Print the PDF into PDF. then try converting.. it worked for me.

Participant
June 14, 2020

that happens to me tooo i dont get it 😞

Participant
August 21, 2011

The CutePDF fix looks like it completely works for me!!!

Printed document through CutePDF, selecting settings that minimized optimization and requested native True Type fonts  (sorry didn't document exactly.....didn't think it would work right away!) .

Then did Acrobat OCR, and now I can copy and paste from Acrobat to other programs without garbage here and there.

Participant
April 15, 2011

Here's the solution.  When you open the PDF document in "Preview", do not do any "saves" on it before you have copied the data you want to paste into Word.  Before you do anything, Do a "save as" on the document and keep one fresh copy just in case you forget and do a save on it.  Once you do a "save" on a PDF document, it will convert the text to garbage when pasted into Word.  It works. 

Participant
April 5, 2011

A PDF file does not store enough information to enable you to re-create it as a Word document (that's why it was invented, to prevent people copying files).

try67
Community Expert
Community Expert
April 5, 2011

That's hardly why the PDF format was invented...

Participant
February 10, 2011

I had that same issue. What I did was to print the pdf file to CutePDF printer. I was able to copy and paste the text from the new document.

February 16, 2011

Thanks Techjf25....Your solution to print to CutePDF worked perfectly and solved an issue that I encounter quite frequently!  Super easy and quick as can be.

Inspiring
May 6, 2009

One approach is a bit of work, but might meet your need. I saved the file to TIFF (600dpi). I then went through each TIFF to converted it to B&W. I think copied the text to WORD. The result was not perfect, but it was in English and could be clipped. The resolution and the B&W were important to the project completion.

Inspiring
May 4, 2009

You simply should just have to change the font used for the display in WORD. Why not try saving to a DOC file?

SebastiaoV
SebastiaoV作成者
Known Participant
May 4, 2009

Thanks for the idea Bill, but it does not work. Even if i change the font in Word it will be still strange symbols. By the way if i save the PDF as a Word file I just get a lot of pages full of symbols, as you can see in the attach.

Is there any way to replace the fonts of the original PDF by changing it by a different one?

Cheers,
S

SebastiaoV
SebastiaoV作成者
Known Participant
May 5, 2009

If the fonts don't have unicode tables and they do not use a standard encoding for mapping the glyph indices to characters then you get garbage characters during copy/paste. You can try using the PDF Fixup Profile "Embed Fonts" in the Preflight tool to embed the font (if you are unable to reauthor the document). However, the font does need to be installed on your system and license to allow embedding in order to do this.


Hi Lori, thanks for the good info! I open the Preflight windows, but there is no "Embed Fonts" in the Fixed up. I guess i do not have that font or the PDF enconding is just bizarre. The font is Gill Sans and Futura, which i have installed both in my system but maybe is a weird version of the font.

It is very frustrating to see how somebody at the Ministry of Health could be so dumb to upload a public PDF in Internet which is supposed to be copy/pasted by researchers all over without knowing this very basic stuff.

By the way, i supposed that i cannot try anything else? I thought i could somehow "force" the actual font by highlighting text and putting Arial, for example, like if it was a Word Document.

Cheers,

Sebastian