Skip to main content
Participant
April 2, 2014
Question

Copy Hindi text from pdf documents

  • April 2, 2014
  • 1 reply
  • 30182 views

Dear ALL,

Being from a work field of news and journo i use a lot of PDF documents. Unfortunately one bottleneck is faced by me regularly - that is copying hindi language text freom many pdf documents especially from pdf newspapers. Can anyone guide me a solution to this...am i missing any additional installation or support file to the regular adobe reader package.

your views would be much appreciated.

    This topic has been closed for replies.

    1 reply

    pwillener
    Legend
    April 2, 2014

    Are these actual text documents, or scanned images?  Can you link to such a PDF?

    P.S. what is your operating system?  Reader version?

    ShashankPAuthor
    Participant
    April 2, 2014

    these are eternal pdf files made in hindi - http://digitalimages.bhaskar.com/cph/epaperpdf/02042014/1DELHI%20CITY-PG1-0.PDF - say from this i wish to copy-paste any text to word but i cant do so..!

    Reader is XI and i run widows 7.

    pwillener
    Legend
    April 2, 2014

    Thank you for the link.  I see that the font used throughout the document is 'DB Bhaskar'.  This is not a font I know, and it does not seem to correspond to any Unicode font I have.  Therefore when pasting copied text into Word it ends up as garbage.

    I have found another Hindi PDF: http://rajbhasha.gov.in/ittools.pdf - this uses a font called 'Mangal'.  When I paste text from that doc into Word as Arial Unicode MS, I seem to get the correct Hindi text.

    Do you have that DB Bhaskar font installed on your computer?