• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Does Tw work with TrueType fonts?

Explorer ,
Jan 23, 2023 Jan 23, 2023

Copy link to clipboard

Copied

I own and maintain the Perl library PDF::Builder, used for generating PDF files. I have encountered a possible problem with Adobe Acrobat Reader (64-bit) on Windows 10. The resulting PDF behaves as expected for "Tw" operators in the stream for core fonts and T1 fonts. However, it seems to be ignored for all the TrueType (.ttf) fonts I tried. Attached is a sample PDF, giving results for Times-Roman core font, English Towne Medium TrueType font, and URW Palladio-L Type 1 font. Each has a line with -5 Tw, 0 Tw, 3 Tw; and then a line with 10 Tc. The TrueType example also has a closed-up line where I individually place each word, emulating a 40% wide space. As far as I can see, the PDF is created with the expected Tw (or Tc) operators. Am I doing something wrong, or are TrueType fonts really a problem? The sample Perl program (as .pl.txt) is also attached, if you can read Perl.

 

In the documentation for Tw, it mentions that it looks specifically for x20 ASCII spaces. For TrueType font outputs, PDF::Builder is using glyph IDs (4 hex digits each). I'm wondering if the renderer is being fooled by those hex digits (no x20 in there). Perhaps the renderer could look for any glyph that has no ink, and treat it as a space, but that might be a problem with different-width spaces.

 

More discussion on this: https://github.com/PhilterPaper/Perl-PDF-Builder/issues/193

Views

2.1K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Explorer , Jan 31, 2023 Jan 31, 2023

I ended up doing a hack to split the text on original ASCII spaces (x20), only if the wordspace value is non-zero, and outputting the spaces as glyph IDs with the TJ operator and a "kerning" amount to adjust the size of the space. It bloats the stream a bit, but it seems to behave exactly the same way as other font types (Type1, core) do with the Tw operator.

Votes

Translate

Translate
Community Expert ,
Jan 23, 2023 Jan 23, 2023

Copy link to clipboard

Copied

[MOVED TO THE ACROBAT SDK DISCUSSIONS]

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jan 23, 2023 Jan 23, 2023

Copy link to clipboard

Copied

This has nothing to do with any Adobe product SDK, if that's why you moved it. The PDF is created from scratch without any Adobe code involved. The problem being reported is most likely independent of the software that created it. That is, I can see a "n Tw" operator in the text stream, and it has the expected effect on core and T1 fonts, but not TrueType.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 23, 2023 Jan 23, 2023

Copy link to clipboard

Copied

Did you test this PDF with other non- Adobe viewers? What did THEY do?

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jan 24, 2023 Jan 24, 2023

Copy link to clipboard

Copied

Yes (XpdfReader, GIMP, Firefox browser, Thunderbird email). All behaved the same way (no effect for Tw operator for TrueType using glyph ID list). I guess they're all "broken" consistently!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Jan 23, 2023 Jan 23, 2023

Copy link to clipboard

Copied

You essentially answered your question yourself: It is clearly specified how word spacing is to be applied:

 

Word spacing shall be applied to every occurrence of the single-byte character code 32 in a string
when using a simple font (including Type 3) or a composite font that defines code 32 as a single-byte
code. It shall not apply to occurrences of the byte value 32 in multiple-byte codes.
(ISO 32000-2:2020 section 9.3.3 Word spacing)

 

Thus:

quote

For TrueType font outputs, PDF::Builder is using glyph IDs (4 hex digits each). I'm wondering if the renderer is being fooled by those hex digits (no x20 in there).

 

As PDF::Builder is using a double byte encoding for TrueType fonts, word spacing can never apply to them. And that is not a matter of being "fooled", it is a matter of working according to spec.

 

quote

Perhaps the renderer could look for any glyph that has no ink, and treat it as a space

 

No. Doing so would simply be wrong.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jan 24, 2023 Jan 24, 2023

Copy link to clipboard

Copied

These aren't multibyte characters. The entries are glyph IDs (note the <> brackets), and are independent of whether the original text was single or multibyte encoded. So if it is looking only for x20's in the stream, I guess it won't ever find any (the glyph ID for a space could be anything). This should probably be noted as a limitation of Tw (but Tc works OK) in the PDF documentation, if it isn't already there in a later edition than I have.

 

My suggestion for looking for ink-less glyphs for special Tw handling realizes that not all will be ASCII spaces, and the width adjustment should be proportional to that applied to a space (rather than a fixed number of points given in the operator). Something to be considered and not dismissed out of hand. Also, this would work independently of whatever encoding was used for a text string or if it came from a glyph ID list.

 

Unless someone has different information, it looks like I am simply going to have to accept that Tw is never going to work with TrueType fonts (for which we output glyph IDs), and I will have to use the separate-word hack to emulate Tw.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Jan 24, 2023 Jan 24, 2023

Copy link to clipboard

Copied

quote

These aren't multibyte characters.

No one talked about multibyte characters. The talk was about multibyte character codes, a term with a specific meaning in the context at hand. And as you're using "glyph IDs (4 hex digits each)" (in other words, you use the encoding Identity-H), you're using double-byte character codes, i.e. multibyte character codes.

quote

The entries are glyph IDs (note the <> brackets),

The entries are character codes. Because you use the encoding Identity-H, character codes here equal glyph IDs. Nonetheless, in the content stream your string parameters are strings of character codes. And in your case they are double-byte character codes.

Also it doesn't matter whether you use <> brackets or () brackets, i.e. whether you use literal strings or hexadecimal strings, that's just your personal preference. You can also use hexadecimal strings for WinAnsiEncoding and literal strings for Identity-H.

quote

So if it is looking only for x20's in the stream, I guess it won't ever find any (the glyph ID for a space could be anything).

According to spec, a PDF viewer must look only for 0x20 (32) bytes, and only for those that are single-byte character codes, as word dividers.
If you use an encoding that does not have a single-byte 0x20 character code, then Tw won't do anything for you. And if you use an encoding that has a single-byte 0x20 character code representing something else than a space character, then Tw will cause funny results. All according to spec...

quote

This should probably be noted as a limitation of Tw (but Tc works OK) in the PDF documentation, if it isn't already there in a later edition than I have.

Word spacing is specified to work like that. If you consider that a limitation, it is an obvious one.

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 25, 2023 Jan 25, 2023

Copy link to clipboard

Copied

Phil28073338r0c0: "This has nothing to do with any Adobe product SDK, if that's why you moved it."

I moved it because I know this is where you will get the most and the best answers to your question.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 26, 2023 Jan 26, 2023

Copy link to clipboard

Copied

Hi,

 

After reading this super interesting discussion, I realized that this is not my lane.

 

But to add my ten cents, I totally agree with  @MikelKlink in the context of specifications.

 

In which case, after reading this article:

 

 

You may want to stick to the OpenType initiative and see if that aids in producing the desired results.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jan 31, 2023 Jan 31, 2023

Copy link to clipboard

Copied

I ended up doing a hack to split the text on original ASCII spaces (x20), only if the wordspace value is non-zero, and outputting the spaces as glyph IDs with the TJ operator and a "kerning" amount to adjust the size of the space. It bloats the stream a bit, but it seems to behave exactly the same way as other font types (Type1, core) do with the Tw operator.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 31, 2023 Jan 31, 2023

Copy link to clipboard

Copied

Amazing!

 

I learned something new today.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Jan 31, 2023 Jan 31, 2023

Copy link to clipboard

Copied

LATEST

Yes, using TJ and adjusting spaces (or replacing spaces by larger adjustments) is how word spacing often is done when Tw cannot be used.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines