Copy link to clipboard
Copied
I don't understand how to use Encoding to obtain pdf text in a document. TJ contains something such as
[(/)64.7017(.)1.59525(.)0.553333(.)-16.0526(\n)-10.4603(.)]TJ
but I don't understand how to traslate it into the correct text.
Any suggestion?
Thank's.
Copy link to clipboard
Copied
This information is all in the PDF reference in detail. The encodings are outside the page stream, and you need to use ToUncicode if available. You canāt just jump in, read all of the relevant chapters carefully.
If you have trouble following the PDF reference please quote the exact lines which are not clear, with chapter and section numbers.
Copy link to clipboard
Copied
in my pdf, I have the following:
11 0 obj
<</BaseFont/UBTAOI+TTE1EA7070t00/FontDescriptor 10 0 R/Type/Font
/FirstChar 1/LastChar 56/Widths[ 278 667 556 556 667 667 222 222 556 500 556 722 611 556 556
722 500 278 333 556 556 833 556 556 333 556 500 556 278 333 667
500 556 556 667 278 333 611 556 278 667 722 556 556 556 556 584
778 722 556 278 500 722 722 833 222]
/Encoding 30 0 R/Subtype/TrueType>>
endobj
30 0 obj
<</Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[
1/space/P/two/four/A/B/l/i/n/k/one/D/F/d/o/w/s/t/r/e/a/m/p/u/hyphen/g/v/b/period/parenleft/E/x/h/underscore/S/f/parenright/T/q/colon/V/C/three/nine/zero/five/asciitilde/G/N/seven/comma/c/U/R/M/j]>>
endobj
I am reading the pdf_reference_1-7, chapter 5.5.2, but I don't find how to combine the data above.
Copy link to clipboard
Copied
No ToUnicode.
So make the effective encoding array by applying the Differences.
Then you can use the byte values in character strings to look up a name for each character.
Now you translate the name to the required encoding.
Use the Adobe Glyph list to convert names.