Copy link to clipboard
Copied
I own and maintain the Perl library PDF::Builder, used for generating PDF files. I have encountered a possible problem with Adobe Acrobat Reader (64-bit) on Windows 10. The resulting PDF behaves as expected for "Tw" operators in the stream for core fonts and T1 fonts. However, it seems to be ignored for all the TrueType (.ttf) fonts I tried. Attached is a sample PDF, giving results for Times-Roman core font, English Towne Medium TrueType font, and URW Palladio-L Type 1 font. Each has a line with -5 Tw, 0 Tw, 3 Tw; and then a line with 10 Tc. The TrueType example also has a closed-up line where I individually place each word, emulating a 40% wide space. As far as I can see, the PDF is created with the expected Tw (or Tc) operators. Am I doing something wrong, or are TrueType fonts really a problem? The sample Perl program (as .pl.txt) is also attached, if you can read Perl.
In the documentation for Tw, it mentions that it looks specifically for x20 ASCII spaces. For TrueType font outputs, PDF::Builder is using glyph IDs (4 hex digits each). I'm wondering if the renderer is being fooled by those hex digits (no x20 in there). Perhaps the renderer could look for any glyph that has no ink, and treat it as a space, but that might be a problem with different-width spaces.
More discussion on this: https://github.com/PhilterPaper/Perl-PDF-Builder/issues/193
I ended up doing a hack to split the text on original ASCII spaces (x20), only if the wordspace value is non-zero, and outputting the spaces as glyph IDs with the TJ operator and a "kerning" amount to adjust the size of the space. It bloats the stream a bit, but it seems to behave exactly the same way as other font types (Type1, core) do with the Tw operator.
Copy link to clipboard
Copied
[MOVED TO THE ACROBAT SDK DISCUSSIONS]
Copy link to clipboard
Copied
This has nothing to do with any Adobe product SDK, if that's why you moved it. The PDF is created from scratch without any Adobe code involved. The problem being reported is most likely independent of the software that created it. That is, I can see a "n Tw" operator in the text stream, and it has the expected effect on core and T1 fonts, but not TrueType.
Copy link to clipboard
Copied
Did you test this PDF with other non- Adobe viewers? What did THEY do?
Copy link to clipboard
Copied
Yes (XpdfReader, GIMP, Firefox browser, Thunderbird email). All behaved the same way (no effect for Tw operator for TrueType using glyph ID list). I guess they're all "broken" consistently!
Copy link to clipboard
Copied
You essentially answered your question yourself: It is clearly specified how word spacing is to be applied:
Word spacing shall be applied to every occurrence of the single-byte character code 32 in a string(ISO 32000-2:2020 section 9.3.3 Word spacing)
when using a simple font (including Type 3) or a composite font that defines code 32 as a single-byte
code. It shall not apply to occurrences of the byte value 32 in multiple-byte codes.
Thus:
For TrueType font outputs, PDF::Builder is using glyph IDs (4 hex digits each). I'm wondering if the renderer is being fooled by those hex digits (no x20 in there).
As PDF::Builder is using a double byte encoding for TrueType fonts, word spacing can never apply to them. And that is not a matter of being "fooled", it is a matter of working according to spec.
Perhaps the renderer could look for any glyph that has no ink, and treat it as a space
No. Doing so would simply be wrong.
Copy link to clipboard
Copied
These aren't multibyte characters. The entries are glyph IDs (note the <> brackets), and are independent of whether the original text was single or multibyte encoded. So if it is looking only for x20's in the stream, I guess it won't ever find any (the glyph ID for a space could be anything). This should probably be noted as a limitation of Tw (but Tc works OK) in the PDF documentation, if it isn't already there in a later edition than I have.
My suggestion for looking for ink-less glyphs for special Tw handling realizes that not all will be ASCII spaces, and the width adjustment should be proportional to that applied to a space (rather than a fixed number of points given in the operator). Something to be considered and not dismissed out of hand. Also, this would work independently of whatever encoding was used for a text string or if it came from a glyph ID list.
Unless someone has different information, it looks like I am simply going to have to accept that Tw is never going to work with TrueType fonts (for which we output glyph IDs), and I will have to use the separate-word hack to emulate Tw.
Copy link to clipboard
Copied
These aren't multibyte characters.
No one talked about multibyte characters. The talk was about multibyte character codes, a term with a specific meaning in the context at hand. And as you're using "glyph IDs (4 hex digits each)" (in other words, you use the encoding Identity-H), you're using double-byte character codes, i.e. multibyte character codes.
The entries are glyph IDs (note the <> brackets),
The entries are character codes. Because you use the encoding Identity-H, character codes here equal glyph IDs. Nonetheless, in the content stream your string parameters are strings of character codes. And in your case they are double-byte character codes.
Also it doesn't matter whether you use <> brackets or () brackets, i.e. whether you use literal strings or hexadecimal strings, that's just your personal preference. You can also use hexadecimal strings for WinAnsiEncoding and literal strings for Identity-H.
So if it is looking only for x20's in the stream, I guess it won't ever find any (the glyph ID for a space could be anything).
According to spec, a PDF viewer must look only for 0x20 (32) bytes, and only for those that are single-byte character codes, as word dividers.
If you use an encoding that does not have a single-byte 0x20 character code, then Tw won't do anything for you. And if you use an encoding that has a single-byte 0x20 character code representing something else than a space character, then Tw will cause funny results. All according to spec...
This should probably be noted as a limitation of Tw (but Tc works OK) in the PDF documentation, if it isn't already there in a later edition than I have.
Word spacing is specified to work like that. If you consider that a limitation, it is an obvious one.
Copy link to clipboard
Copied
Phil28073338r0c0: "This has nothing to do with any Adobe product SDK, if that's why you moved it."
I moved it because I know this is where you will get the most and the best answers to your question.
Copy link to clipboard
Copied
Hi,
After reading this super interesting discussion, I realized that this is not my lane.
But to add my ten cents, I totally agree with @MikelKlink in the context of specifications.
In which case, after reading this article:
You may want to stick to the OpenType initiative and see if that aids in producing the desired results.
Copy link to clipboard
Copied
I ended up doing a hack to split the text on original ASCII spaces (x20), only if the wordspace value is non-zero, and outputting the spaces as glyph IDs with the TJ operator and a "kerning" amount to adjust the size of the space. It bloats the stream a bit, but it seems to behave exactly the same way as other font types (Type1, core) do with the Tw operator.
Copy link to clipboard
Copied
Amazing!
I learned something new today.
Copy link to clipboard
Copied
Yes, using TJ and adjusting spaces (or replacing spaces by larger adjustments) is how word spacing often is done when Tw cannot be used.