• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Copy Hindi text from pdf documents

New Here ,
Apr 02, 2014 Apr 02, 2014

Copy link to clipboard

Copied

Dear ALL,

Being from a work field of news and journo i use a lot of PDF documents. Unfortunately one bottleneck is faced by me regularly - that is copying hindi language text freom many pdf documents especially from pdf newspapers. Can anyone guide me a solution to this...am i missing any additional installation or support file to the regular adobe reader package.

your views would be much appreciated.

Views

28.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Apr 02, 2014 Apr 02, 2014

Copy link to clipboard

Copied

Are these actual text documents, or scanned images?  Can you link to such a PDF?

P.S. what is your operating system?  Reader version?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 02, 2014 Apr 02, 2014

Copy link to clipboard

Copied

these are eternal pdf files made in hindi - http://digitalimages.bhaskar.com/cph/epaperpdf/02042014/1DELHI%20CITY-PG1-0.PDF - say from this i wish to copy-paste any text to word but i cant do so..!

Reader is XI and i run widows 7.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Apr 02, 2014 Apr 02, 2014

Copy link to clipboard

Copied

Thank you for the link.  I see that the font used throughout the document is 'DB Bhaskar'.  This is not a font I know, and it does not seem to correspond to any Unicode font I have.  Therefore when pasting copied text into Word it ends up as garbage.

I have found another Hindi PDF: http://rajbhasha.gov.in/ittools.pdf - this uses a font called 'Mangal'.  When I paste text from that doc into Word as Arial Unicode MS, I seem to get the correct Hindi text.

Do you have that DB Bhaskar font installed on your computer?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 02, 2014 Apr 02, 2014

Copy link to clipboard

Copied

well yes....when we do it from the DB Bhaskar the results are much better with most text recognised correctly. Unfortunately in India we will end with different fonts used for creating these PDF's and most of then would not be provided by Adobe repository and that is where my problem starts as i need to edit these varied fonts pdf's.

Regards.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Apr 02, 2014 Apr 02, 2014

Copy link to clipboard

Copied

I cannot really offer you a solution.  If you want to copy/paste it into Word (or convert such a PDF to Word), you will need to have these fonts installed.

In Japan the situation is quite different: all fonts adhere to the Unicode standard.  If you have any say in the creation of these documents, you should only use Unicode fonts.  I see that the Mangal font I have on my system has Devanagari glyphs in the range U+0901 ~ U+0970.  That is the standard.

Where does the DB Bhaskar font have the Hindi glyphs?

Anyway, I hope you can also get some input from Adobe's Indian support staff.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Apr 02, 2014 Apr 02, 2014

Copy link to clipboard

Copied

LATEST

It is theoretically possible to have PDF documents containing hindi fonts which copy correctly. But these PDF files are not suitable. To be technical, the hindi language fonts are created as Latin 1 (European) fonts, but containing hindi characters. Copying these is impossible.

It may be that the file creators are not using suitable software or fonts; you could complain to them.

It may be that such software is not available, I do not know.

It may be that this is a deliberate choice to protect copyright in a limited way and make people work harder to copy text!

Anyway, it isn't a Reader limitation or anything you can fix yourself.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines