Skip to main content
SebastiaoV
Known Participant
May 4, 2009
質問

Copy text from a PDF to word. Just get Symbols

  • May 4, 2009
  • 返信数 9.
  • 469213 ビュー

Hello,

I have a public PDF with no Copying Restrictions. When I try to copy text from the PDF highlighted text to WORD I only get unreadable garbage.
I can select the desired text and copy it into word but when I paste the text it is pasted like symbols and lines.

I tried Special Paste and does not works. It says the font is a Gill Sans something (with numbers and so on), no really a font it seems but when i change it to Arial i still get symbols.

Any help or ideas,

Cheers,
Sebastian

このトピックへの返信は締め切られました。

返信数 9

Participant
November 21, 2020

Print the PDF into PDF. then try converting to word. worked for me.

Participating Frequently
October 4, 2021

In Acrobat Pro delete all font information: Edit > Preflight > pdf fixups > Convert fonts to outlines.

Next, re-recognise all text: Scan and OCR > Recognise text.

Then export as Plain text.

You may get the odd scan error.

Participant
November 21, 2020

Print the PDF into PDF. then try converting.. it worked for me.

Participant
June 14, 2020

that happens to me tooo i dont get it 😞

Participant
August 21, 2011

The CutePDF fix looks like it completely works for me!!!

Printed document through CutePDF, selecting settings that minimized optimization and requested native True Type fonts  (sorry didn't document exactly.....didn't think it would work right away!) .

Then did Acrobat OCR, and now I can copy and paste from Acrobat to other programs without garbage here and there.

Participant
April 15, 2011

Here's the solution.  When you open the PDF document in "Preview", do not do any "saves" on it before you have copied the data you want to paste into Word.  Before you do anything, Do a "save as" on the document and keep one fresh copy just in case you forget and do a save on it.  Once you do a "save" on a PDF document, it will convert the text to garbage when pasted into Word.  It works. 

Participant
April 5, 2011

A PDF file does not store enough information to enable you to re-create it as a Word document (that's why it was invented, to prevent people copying files).

try67
Community Expert
Community Expert
April 5, 2011

That's hardly why the PDF format was invented...

Participant
February 10, 2011

I had that same issue. What I did was to print the pdf file to CutePDF printer. I was able to copy and paste the text from the new document.

February 16, 2011

Thanks Techjf25....Your solution to print to CutePDF worked perfectly and solved an issue that I encounter quite frequently!  Super easy and quick as can be.

Inspiring
May 6, 2009

One approach is a bit of work, but might meet your need. I saved the file to TIFF (600dpi). I then went through each TIFF to converted it to B&W. I think copied the text to WORD. The result was not perfect, but it was in English and could be clipped. The resolution and the B&W were important to the project completion.

Inspiring
May 4, 2009

You simply should just have to change the font used for the display in WORD. Why not try saving to a DOC file?

SebastiaoV
SebastiaoV作成者
Known Participant
May 4, 2009

Thanks for the idea Bill, but it does not work. Even if i change the font in Word it will be still strange symbols. By the way if i save the PDF as a Word file I just get a lot of pages full of symbols, as you can see in the attach.

Is there any way to replace the fonts of the original PDF by changing it by a different one?

Cheers,
S

Participant
March 15, 2010

After looking inside the PDF it turns out that no usable encoding information is present (neither in the PDF nor in the embedded font data) to derive the meaning of the characters/glyphs that are displayed on the pages in the document.

The fonts actualy are all embedded, but in a way that all encoding information has been removed. This is a typical example of a PDF that is syntactically fully compliant with the PDF spec but where important information about the meaning of the text in it has been thrown away during the process of making the PDF. As far as I can tell it would be very difficult to recover the encoding info. Strange as it may sound the best option may be to convert the pages to oixel and then run OCR on them....

According to the document info PageMaker 7 and Distiller 5 have been used - not sure whether that combination wasn't quite up to the task but I am on Acrobat 9 now and haven't seen Pagemaker for years...

HTH.

Olaf Drümmer

callas software


I have this exact same problem.  It is very frustrating.  How is it not possible to "grab" onto the text in the pdf ??!!

  • I am looking at it.  I can see it.  I can read it.
  • I can highlight the individual letters and words with the mouse pointer. (So it's not just a "picture")
  • With a pdf editor, I can even make the text bold, italic, or increase the font size.

SO WHY CAN'T I COPY THE TEXT!   AAARGH!

No, the file is not protected.

Yes, I have tried saving as different formats.  (The "save as tiff file workaround" idea is  very time consuming and greatly degrades the quallity.)

The font is shown as being: "Arial083.313"

Something in the pdf program is recognizing the text, translating the 1's and 0's (that make up all computer files) into the letters that display on the screen that I can read and select with the mouse. So why can't that same "something" allow me to copy it?  So frustrating.

Somebody please help.  If you can solve this problem you are awesome.