Skip to main content
October 25, 2019
Question

Encoding text

  • October 25, 2019
  • 1 reply
  • 482 views

I don't understand how to use Encoding to obtain pdf text in a document. TJ contains something such as 

[(/)64.7017(.)1.59525(.)0.553333(.)-16.0526(\n)-10.4603(.)]TJ

but I don't understand how to traslate it into the correct text.

Any suggestion?

Thank's.

    This topic has been closed for replies.

    1 reply

    Legend
    October 25, 2019

    This information is all in the PDF reference in detail. The encodings are outside the page stream, and you need to use ToUncicode if available. You can’t just jump in, read all of the relevant chapters carefully. 

     

    If you have trouble following the PDF reference please quote the exact lines which are not clear, with chapter and section numbers. 

    October 25, 2019

    in my pdf, I have the following:

    11 0 obj
    <</BaseFont/UBTAOI+TTE1EA7070t00/FontDescriptor 10 0 R/Type/Font
    /FirstChar 1/LastChar 56/Widths[ 278 667 556 556 667 667 222 222 556 500 556 722 611 556 556
    722 500 278 333 556 556 833 556 556 333 556 500 556 278 333 667
    500 556 556 667 278 333 611 556 278 667 722 556 556 556 556 584
    778 722 556 278 500 722 722 833 222]
    /Encoding 30 0 R/Subtype/TrueType>>
    endobj
    30 0 obj
    <</Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[
    1/space/P/two/four/A/B/l/i/n/k/one/D/F/d/o/w/s/t/r/e/a/m/p/u/hyphen/g/v/b/period/parenleft/E/x/h/underscore/S/f/parenright/T/q/colon/V/C/three/nine/zero/five/asciitilde/G/N/seven/comma/c/U/R/M/j]>>
    endobj

     

    I am reading the pdf_reference_1-7, chapter 5.5.2, but I don't find how to combine the data above.

    Legend
    October 25, 2019

    No ToUnicode.

    So make the effective encoding array by applying the Differences.

    Then you can use the byte values in character strings to look up a name for each character.

    Now you translate the name to the required encoding.

    Use the Adobe Glyph list to convert names.