Copy link to clipboard
Copied
Dear ALL,
Being from a work field of news and journo i use a lot of PDF documents. Unfortunately one bottleneck is faced by me regularly - that is copying hindi language text freom many pdf documents especially from pdf newspapers. Can anyone guide me a solution to this...am i missing any additional installation or support file to the regular adobe reader package.
your views would be much appreciated.
Copy link to clipboard
Copied
Are these actual text documents, or scanned images? Can you link to such a PDF?
P.S. what is your operating system? Reader version?
Copy link to clipboard
Copied
these are eternal pdf files made in hindi - http://digitalimages.bhaskar.com/cph/epaperpdf/02042014/1DELHI%20CITY-PG1-0.PDF - say from this i wish to copy-paste any text to word but i cant do so..!
Reader is XI and i run widows 7.
Copy link to clipboard
Copied
Thank you for the link. I see that the font used throughout the document is 'DB Bhaskar'. This is not a font I know, and it does not seem to correspond to any Unicode font I have. Therefore when pasting copied text into Word it ends up as garbage.
I have found another Hindi PDF: http://rajbhasha.gov.in/ittools.pdf - this uses a font called 'Mangal'. When I paste text from that doc into Word as Arial Unicode MS, I seem to get the correct Hindi text.
Do you have that DB Bhaskar font installed on your computer?
Copy link to clipboard
Copied
well yes....when we do it from the DB Bhaskar the results are much better with most text recognised correctly. Unfortunately in India we will end with different fonts used for creating these PDF's and most of then would not be provided by Adobe repository and that is where my problem starts as i need to edit these varied fonts pdf's.
Regards.
Copy link to clipboard
Copied
I cannot really offer you a solution. If you want to copy/paste it into Word (or convert such a PDF to Word), you will need to have these fonts installed.
In Japan the situation is quite different: all fonts adhere to the Unicode standard. If you have any say in the creation of these documents, you should only use Unicode fonts. I see that the Mangal font I have on my system has Devanagari glyphs in the range U+0901 ~ U+0970. That is the standard.
Where does the DB Bhaskar font have the Hindi glyphs?
Anyway, I hope you can also get some input from Adobe's Indian support staff.
Copy link to clipboard
Copied
It is theoretically possible to have PDF documents containing hindi fonts which copy correctly. But these PDF files are not suitable. To be technical, the hindi language fonts are created as Latin 1 (European) fonts, but containing hindi characters. Copying these is impossible.
It may be that the file creators are not using suitable software or fonts; you could complain to them.
It may be that such software is not available, I do not know.
It may be that this is a deliberate choice to protect copyright in a limited way and make people work harder to copy text!
Anyway, it isn't a Reader limitation or anything you can fix yourself.