• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Copy/Pasting from Hebrew font in PDF

Guest
Feb 15, 2010 Feb 15, 2010

Copy link to clipboard

Copied

Hi -

I'm trying to copy and paste text from a PDF so I can edit and analyze the contents.  The file was created in Hebrew. It is set of Israel's election results and available on their government website: http://www.moin.gov.il/Apps/PubWebSite/mainmenu.nsf/4DF815EA4AC4E503C2256BA6002EE732/8E408A044EE1D3E...

Under document properties, the fonts listed are Helvetica (standard) and two unknown, embedded subsets (TTE1C42600t00 and TTE1DA2290t00).

I have tried:

- Copy and pasting text from Reader 9 --> opening in Word and Excel, changing around fonts

- Copy and pasting text from Acrobat 8 Professional --> opening in Word and Excel, changing around fonts

- Right-click, open table as spreadsheet

- Exporting as .doc, .TIFF, PostScript, .txt, .html

- Export as image, running OCR (trialware Hebrew OCR program I used did not pick up all characters correctly)

- Adobe website mentions an Adobe Reader Middle Eastern Edition 7, but when I go to download it, it takes me to the regular Reader v9 page

Can anyone think of a way to extract the data from this document so that it is editable?

Any help would be appreciated!

Views

10.7K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Apr 06, 2010 Apr 06, 2010

Copy link to clipboard

Copied

so none of the experiments you have listed has actually worked for you?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Apr 10, 2010 Apr 10, 2010

Copy link to clipboard

Copied

When a pdf has custom-encoded fonts (such as this one), there's not much to do to get the text out using standard methods. One thing you can do (assuming all the custom encoding is the same!), is do a search/replace for each letter to fix the gobbledegook once you get it into a word processing program. Unfortunately, a lot of documents have mixed and matched custom encoding, so it's prety much hopeless.

In my experience, the most reliable OCR software for Hebrew is FineReader

HTH,

Harbs

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 18, 2010 Aug 18, 2010

Copy link to clipboard

Copied

LATEST

MP12345 wrote:

Hi -

I'm trying to copy and paste text from a PDF so I can edit and analyze the contents.  The file was created in Hebrew. It is set of Israel's election results and available on their government website.

Under document properties, the fonts listed are Helvetica (standard) and two unknown, embedded subsets (TTE1C42600t00 and TTE1DA2290t00).

I have tried:

- Copy and pasting text from Reader 9 --> opening in Word and Excel, changing around fonts

- Copy and pasting text from Acrobat 8 Professional --> opening in Word and Excel, changing around fonts

- Right-click, open table as spreadsheet

- Exporting as .doc, .TIFF, PostScript, .txt, .html

- Export as image, running OCR (trialware Hebrew OCR program I used did not pick up all characters correctly)

- Adobe website mentions an Adobe Reader Middle Eastern Edition 7, but when I go to download it, it takes me to the regular Reader v9 page

Can anyone think of a way to extract the data from this document so that it is editable?

Any help would be appreciated!


I recommend you to try another OCR.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines