• Global community
    • Language:
      • Deutsch
      • English
      • EspaƱol
      • FranƧais
      • PortuguĆŖs
  • ę—„ęœ¬čŖžć‚³ćƒŸćƒ„ćƒ‹ćƒ†ć‚£
    Dedicated community for Japanese speakers
  • ķ•œźµ­ ģ»¤ė®¤ė‹ˆķ‹°
    Dedicated community for Korean speakers
Exit
0

Encoding text

New Here ,
Oct 25, 2019 Oct 25, 2019

Copy link to clipboard

Copied

I don't understand how to use Encoding to obtain pdf text in a document. TJ contains something such as 

[(/)64.7017(.)1.59525(.)0.553333(.)-16.0526(\n)-10.4603(.)]TJ

but I don't understand how to traslate it into the correct text.

Any suggestion?

Thank's.

Views

259

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 25, 2019 Oct 25, 2019

Copy link to clipboard

Copied

This information is all in the PDF reference in detail. The encodings are outside the page stream, and you need to use ToUncicode if available. You canā€™t just jump in, read all of the relevant chapters carefully. 

 

If you have trouble following the PDF reference please quote the exact lines which are not clear, with chapter and section numbers. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 25, 2019 Oct 25, 2019

Copy link to clipboard

Copied

in my pdf, I have the following:

11 0 obj
<</BaseFont/UBTAOI+TTE1EA7070t00/FontDescriptor 10 0 R/Type/Font
/FirstChar 1/LastChar 56/Widths[ 278 667 556 556 667 667 222 222 556 500 556 722 611 556 556
722 500 278 333 556 556 833 556 556 333 556 500 556 278 333 667
500 556 556 667 278 333 611 556 278 667 722 556 556 556 556 584
778 722 556 278 500 722 722 833 222]
/Encoding 30 0 R/Subtype/TrueType>>
endobj
30 0 obj
<</Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[
1/space/P/two/four/A/B/l/i/n/k/one/D/F/d/o/w/s/t/r/e/a/m/p/u/hyphen/g/v/b/period/parenleft/E/x/h/underscore/S/f/parenright/T/q/colon/V/C/three/nine/zero/five/asciitilde/G/N/seven/comma/c/U/R/M/j]>>
endobj

 

I am reading the pdf_reference_1-7, chapter 5.5.2, but I don't find how to combine the data above.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 25, 2019 Oct 25, 2019

Copy link to clipboard

Copied

LATEST

No ToUnicode.

So make the effective encoding array by applying the Differences.

Then you can use the byte values in character strings to look up a name for each character.

Now you translate the name to the required encoding.

Use the Adobe Glyph list to convert names.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines