Copy/Pasting from Hebrew font in PDF

Forum|Forum|16 years ago
February 15, 2010
3 replies
11181 views

Hi -

I'm trying to copy and paste text from a PDF so I can edit and analyze the contents. The file was created in Hebrew. It is set of Israel's election results and available on their government website: http://www.moin.gov.il/Apps/PubWebSite/mainmenu.nsf/4DF815EA4AC4E503C2256BA6002EE732/8E408A044EE1D3EDC2257520002817B8/$FILE/News.pdf.

Under document properties, the fonts listed are Helvetica (standard) and two unknown, embedded subsets (TTE1C42600t00 and TTE1DA2290t00).

I have tried:

- Copy and pasting text from Reader 9 --> opening in Word and Excel, changing around fonts

- Copy and pasting text from Acrobat 8 Professional --> opening in Word and Excel, changing around fonts

- Right-click, open table as spreadsheet

- Exporting as .doc, .TIFF, PostScript, .txt, .html

- Export as image, running OCR (trialware Hebrew OCR program I used did not pick up all characters correctly)

- Adobe website mentions an Adobe Reader Middle Eastern Edition 7, but when I go to download it, it takes me to the regular Reader v9 page

Can anyone think of a way to extract the data from this document so that it is editable?

Any help would be appreciated!

This topic has been closed for replies.

A

aslanbash

Participant

MP12345 wrote:
Hi -
I'm trying to copy and paste text from a PDF so I can edit and analyze the contents. The file was created in Hebrew. It is set of Israel's election results and available on their government website.
Under document properties, the fonts listed are Helvetica (standard) and two unknown, embedded subsets (TTE1C42600t00 and TTE1DA2290t00).
I have tried:
- Copy and pasting text from Reader 9 --> opening in Word and Excel, changing around fonts
- Copy and pasting text from Acrobat 8 Professional --> opening in Word and Excel, changing around fonts
- Right-click, open table as spreadsheet
- Exporting as .doc, .TIFF, PostScript, .txt, .html
- Export as image, running OCR (trialware Hebrew OCR program I used did not pick up all characters correctly)
- Adobe website mentions an Adobe Reader Middle Eastern Edition 7, but when I go to download it, it takes me to the regular Reader v9 page
Can anyone think of a way to extract the data from this document so that it is editable?
Any help would be appreciated!

I recommend you to try another OCR.

Harbs.

Legend

When a pdf has custom-encoded fonts (such as this one), there's not much to do to get the text out using standard methods. One thing you can do (assuming all the custom encoding is the same!), is do a search/replace for each letter to fix the gobbledegook once you get it into a word processing program. Unfortunately, a lot of documents have mixed and matched custom encoding, so it's prety much hopeless.

In my experience, the most reliable OCR software for Hebrew is FineReader

HTH,

Harbs

Y

youthful_athlete16B8

Adobe Employee

so none of the experiments you have listed has actually worked for you?

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.