UTF-8 encoding and "show"

Report · Apr 13, 2014

Hi,

How may I show UTF-8 character signs? I'd like to show polish signs.

I know there is something like CID fonts. I don't know what's all about it, however.

Please help...

Report · Apr 13, 2014

There is no way to use UTF-8.

There is no escape from getting a full understanding of font encoding, generating a font with a suitable encoding, and generating a show string suitably encoded.

Certain Latin2 symbols are font in the default fonts, otherwise you need to obtain and embed a suitable font.

Report · Apr 13, 2014

Test Screen Name wrote:
There is no escape from getting a full understanding of font encoding, generating a font with a suitable encoding, and generating a show string suitably encoded.

So... How may one do that? Where may one find suitable documentation?

Isn't (La)TeX files converted to PS before printing? If so, it should be possible to print UTF-8 signs.

Report · Apr 14, 2014

Where may one find suitable documentation?

PostScript Language Reference Manual, a constant friend to those writing PostScript.

Isn't (La)TeX files converted to PS before printing? If so, it should be possible to print UTF-8 signs.

Your point is perfectly true, but based on a misunderstanding.

PostScript was written with an understanding of the needs of typesetting in the many languages of the world, certainly including Polish. But PostScript was written long before Unicode was invented. So, while you can use the symbols needed to typeset the Polish language you cannot simply use the Unicode representation of the Polish language. Entirely different.

It is a fact that you can learn how to typeset Latin1 in an afternoon of looking at samples. It is also a fact that typesetting other character sets often requires deep and long study. Long and frustrating, but rewarding but interesting. If you don't want to go through all that study, I recommend use use LaTeX... we can help, but don't expect to find the solution in a few lines of code.

Report · Apr 14, 2014

(Unless your needed characters are all in the Adobe standard font character set, of course, in which case it's just a case of re-encoding).

Report · Apr 14, 2014

I think you are in luck. In Appendix E of the 3rd edition, you find all the necessary characters for Polish, Ą Ć Ę Ł Ń Ó Ś Ź Ż

Ą Ć Ę Ł Ń Ó Ś Ź Ż

These are available by their Encoding name in many fonts, and with luck in the fonts included with a level 3 interpreter. However... many printers still in use are level 2. It is probably safest to embed a font as I suggested if you want general support. Or, to be simpler, use the Ł character, which is available in StandardEncoding, and composite your own character with accents. Ogonek, acute and dot accent are all standard.

Report · Apr 14, 2014

Test Screen Name wrote:
I think you are in luck. In Appendix E of the 3rd edition, you find all the necessary characters for Polish, Ą Ć Ę Ł Ń Ó Ś Ź Ż

I've found that characters (in Appendix E of the 3rd edition). There's a footnote connected with them, however:

These characters are present in the extended (315-character) Latin character set, but not in the original (229-character) set.

So... Do you know how may I use this characters? Could you give me a simple example?

Report · Apr 14, 2014

As I have already said, you re-encode the font to use these names, a fundamental thing. Really, as the encoding of a font is not known, every use of fonts should re-encode. 5.9.1 shows an example of this with ISOLatin1Encoding. Just replace this built in name with your own derivation as a 256 element array of names. See 5.3 also.

Report · Apr 14, 2014

Also see the PostScript Green Book, which is a good primer for PostScript programmers, as it describes good practice (like ALWAYS re-encoding fonts) rather than just syntax. Happily, it is available online http://www-cdf.fnal.gov/offline/PostScript/GREENBK.PDF

Report · Apr 14, 2014

I tried this:

true setglobal

/f /Helvetica findfont 100 scalefont def

/enc f /Encoding get def

enc 0 /Aogonek put % error here

f /Encoding enc put

f setfont

0 setgray

/txt 2 string def

10 600 moveto

/A glyphshow

/Aogonek glyphshow

txt 0 0 put % put Aogonek code

txt show % try to show Aogonek

showpage

but the error is:

Error: /invalidaccess in --put--

Operand stack:

--nostringval-- 0 Aogonek

Execution stack:

%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-

- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- fa

lse 1 %stopped_push 1950 1 3 %oparray_pop 1949 1 3 %oparray_

pop 1933 1 3 %oparray_pop 1819 1 3 %oparray_pop --nostringval-

- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringv

al-- 2 %stopped_push --nostringval--

Dictionary stack:

--dict:1182/1684(ro)(G)-- --dict:0/20(G)-- --dict:81/200(L)--

Current allocation mode is global

Last OS error: No such file or directory

Current file position is 443

GPL Ghostscript 9.14: Unrecoverable error, exit code 1

Report · Apr 15, 2014

Your code is going in the right direction, but it fails because you are trying to write readonly objects. Both the Helvetica font dictionary and its Encoding array belong to the system (in a printer they exist in ROM) and cannot be modified. You have to duplicate the objects and redefine the font. It's also vital to delete FID from the duplicated font dictionary. Refer to the Green Book for a possible technique.

Report · Apr 15, 2014

Thanks. I have now:

/F { %def

findfont exch scalefont setfont

} bind def

/RE { %def

findfont begin

currentdict dup length dict begin

{ %forall

1 index /FID ne {def} {pop pop} ifelse

} forall

/FontName exch def dup length 0 ne { %if

/Encoding Encoding 256 array copy def

0 exch { %forall

dup type /nametype eq { %ifelse

Encoding 2 index 2 index put

pop 1 add

}{ %else

exch pop

} ifelse

} forall

} if pop

currentdict dup end end

/FontName get exch definefont pop

} bind def

/myencoding [ 0 /Lslash 1 /Aogonek 2 /Cacute 3 /Eogonek 4 /ogonek 5 /dagger ] def

myencoding /myfont /Helvetica RE

100 /myfont F

10 620 moveto (\000\001\002\003\004\005) show

showpage

But only Lslash, ogonek and dagger are shown one the page. It's probably according to the footnote I posted earlier:

These characters are present in the extended (315-character) Latin character set, but not in the original (229-character) set.

How may I use the extendend Latin character set?

Report · Apr 15, 2014

You cannot guarantee that the built in fonts contain these characters. Unless you are lucky with your printer and know you will always use the printer, you must therefore

- obtain a suitable font (preferably in type 1 format)

- convert to PFA

- include it in your PostScript

- find, reencode, use that font

Report · Apr 15, 2014

Hmm... It all seems to be very complicated to me. I don't know where to find suitable font (must be for free), etc.

Couldn't it be easier? Maybe create some composite characters?

Report · Apr 15, 2014

Yes, it's certainly complicated to embed fonts.

All the accents you need are available in standard fonts, so you can combine them. Making a composite character isn't really viable, but you can set two characters, with the necessary adjustment to cause overlaying. Unfortunately, if you make a PDF, your text will not be properly extractable, but that may not be a concern.