• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

To list all the glyphs and their counts used in a document from a specific font

Explorer ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

I am trying to make a script to list all the glyphs by their names or IDs and their counts in a document. The glyphs should be listed by their names or IDs (not by Unicode) as each Unicode character has several alternate shapes and I want to get result for each and every alternate shape. I have tried but in vain. Here is my script that I am working on.

 

// Get the active document
var doc = app.activeDocument;

// Define an object to store the glyph name counts
var glyphCounts = {};

// Define the output file path
var outputFile = File("~/Desktop/GlyphCount.txt");
outputFile.open("w");

// Write the initial line to the file
outputFile.writeln("Glyph Counts for the font: Arabic Font 1");

// Function to process characters and count glyphs
function countGlyphs(character) {
    try {
        // Check if the character has glyphs
        if (character.hasOwnProperty('glyphs') && character.glyphs.length > 0) {
            var glyph = character.glyphs[0]; // Get the first glyph object
            var glyphName = glyph.glyphName; // Get the glyph name
            
            // Use a placeholder if glyphName is undefined
            if (!glyphName) {
                glyphName = "UnknownGlyph";
            }

            // Update the glyph count based on the glyph name
            if (glyphCounts[glyphName] === undefined) {
                glyphCounts[glyphName] = 1;
            } else {
                glyphCounts[glyphName]++;
            }

            // Debug output using alerts
            alert("Glyph: " + glyphName + ", Count: " + glyphCounts[glyphName]);
        } else {
            alert("No glyphs found for character.");
        }
    } catch (e) {
        // Log errors if glyph retrieval fails
        alert("Error processing character. Error: " + e.toString());
    }
}

// Loop through all stories (text frames) in the document
for (var i = 0; i < doc.stories.length; i++) {
    var story = doc.stories[i];

    // Loop through all characters in the story
    for (var j = 0; j < story.characters.length; j++) {
        var character = story.characters[j];
        var font = character.appliedFont.name;

        // Check if the character is from the specific font you're interested in
        if (font === "Arabic Font 1") {
            countGlyphs(character);
        }
    }
}

// Write the glyph counts to the file
for (var glyphName in glyphCounts) {
    outputFile.writeln("Glyph: " + glyphName + ", Count: " + glyphCounts[glyphName]);
}

// Close the file
outputFile.close();

// Notify the user
alert("Glyph count saved to " + outputFile.fsName);


Any help is greatly appreciated.

TOPICS
Scripting

Views

2.2K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

I'm not good enough at scripting to be able to help you, but would test the face book scripting group https://www.facebook.com/groups/indesignscripting/

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

@zuhair777

 

Character doesn't have glyphs collection - only glyphForm and: 

 

https://www.indesignjs.de/extendscriptAPI/indesign-latest/#AlternateGlyphForms.html#d1e85227

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

@zuhair777 

 

Same text - with Swash Alternates ON on the left and OFF on the right:

 

RobertatIDTasker_0-1726408853070.png

 

returns the same value for the Glyph Form:

 

RobertatIDTasker_0-1726408529465.png

 

And as per link provided earlier:

 

AlternateGlyphForms.NONE

Does not use an alternate form.

1852796517

 

Could you please provide a sample INDD / IDML file for testing?

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

I appreciate your information. If we create a PDF from InDesign, can we get the list from it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

quote

I appreciate your information. If we create a PDF from InDesign, can we get the list from it.


By @zuhair777

 

No idea - can you provide sample INDD / IDML file for testing?

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

Where can I send you the IDML file?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

Please ZIP it and upload somewhere - WeTransfer or something - and send me link.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

RobertatIDTasker_0-1726419229677.png

 

Looks like this Glyph Form is read/write - so it's rather set-able in the UI - but I can't find it anywhere?

 

@James Gifford—NitroPress , @Joel Cherney , @Peter Spier @Peter Kahrel - can you pitch in?

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

Afraid this is well outside my eareas of expertise. Sorry.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

Don't know how much I can help, really. All I can spot here are incompatibilities in terminology, to be honest. 

 

"Glyph forms" and "Alternate Glyph Forms" aren't general typographical terms. I'm, um, 85% certain that they apply only to Chinese and Japanese characters. It's related to Han Unification, and so that is most likely why there's nothing in your UI about it, and nothing in my Middle East/North Africa UI, either. 

 

I think, @zuhair777 , that what you actually want is a list of every glyph you can see in the Glyphs panel, right? Where you can see three different glyphs that all have the same Unicode value but have distinct GIDs, right? That what I understand, when I read this:

 

The glyphs should be listed by their names or IDs (not by Unicode) as each Unicode character has several alternate shapes and I want to get result for each and every alternate shape.

 

Here's a thread that may help you out - it's one where Peter Kahrel and Jongware and Dave Saunders (!!!) are all chatting about this issue. Dave's solution is, as he says, ugly, but it might work in your case. 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

Thank you.

Can we get the list from PDF created by InDesign?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

@zuhair777 

 

Left is text copied from Acrobat - right - from InDesign:

RobertatIDTasker_0-1726420197403.png

 

And this is text copied from InDesign to WORD - saved as DOCX:

RobertatIDTasker_1-1726420445168.png

 

saved as RTF:

RobertatIDTasker_2-1726420648839.png

 

RTF exported from Acrobat is rubbish.

 

If there are significant differences - between what is displayed in InDesign and copied to WORD - then macro in WORD could probably do it.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Sep 15, 2024 Sep 15, 2024

Copy link to clipboard

Copied

The composer produces a sequence of character codes and associated opentype features.

Those opentype features can get specified by dedicated attributes - e.g. stylistic set, nominator/denominator, slashed zero and umpteen others, or they can get specified out of context: positional form e.g. first/middle/last character of a word. There is also an attribute to directly invoke any opentype feature.

I think that the "form" in question is here referring to the positional form.

https://learn.microsoft.com/en-us/typography/opentype/spec/features_fj#init

https://learn.microsoft.com/en-us/typography/opentype/spec/features_fj#tag-fina

The term "form" is also used for other meanings, e.g. "capital form".

Joel might be thinking of leading hangul forms?

https://learn.microsoft.com/en-us/typography/opentype/spec/features_ko#tag-ljmo

 

The composer then passes those characters and OT features to the font program. This translates them into a sequence of glyph IDs, plus transformations (as in positioning and scaling matrix).

There is no direct mapping - a character can produce multiple glyphs, or multiple characters may result in a single glyph e.g. "st" ligatures, fractions "1/2". Superscript may produce a different glyph or may get fauxed by scaling.

 

The output of the composer is stored in a family of data structures called wax. These are not accessible to scripting, you'd need a C++ plug-in to report on them. This wax is what keeps e.g. old CS2 documents in their original state until you start editing causing a partial or full recompose.

 

Sharing examples in DM rather than posting to the forum limits their use. E.g. good to him when Robert can have a look at a transient share but that keeps out others that may come along far in the future. I'll take an example from a different thread instead - the recent one about Kashidas.

 

The screenshot below is from a plug-in, it shows the different glyphs used for the positional form opentype features, for the first word. It still uses left-to-right, haven't yet added r2l for arabic, for lack of examples when I wrote the plug-in. Labels are localized to German, should be still understandable.

 

DirkBecker_0-1726460183082.png

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

In Arabic, most of the letters have four shapes or four forms (1) isolated, (2) initial, (3) medial and (4) final as shown in the image. Arabic script is written from right to left.

Arabic 01.jpg

The letter Baa (with a dot below) ب has a Unicode value but its other forms don’t have any Unicode values. These shapes automatically change with the the preceding and succeeding combining letters. In more complex Arabic fonts even a single letter may have a large number of initial shapes for a single Unicode character depending on the succeeding joining letters after it as shown in the image below. These are all alternate initial shapes for a single character Baa (with a dot below).

Arabic 02.jpg

Below are different alternate medial shapes for the same letter Baa (with a dot below).

Arabic 03.jpg

By "glyph id" I mean the numerical index of these particular glyphs in a particular font. So I am trying to create a script to list all the glyphs, by their IDs, and their counts in a perticular document .

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

How about export to IDML - or even snippet? 

 

There should be info which glyph / form should be used / displayed - unless, it's done when text is rendered?

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

Unfortunately, there is no glyph info in the IDML nor IDMS:

 

RobertatIDTasker_0-1726481969595.png

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

RTF doesn't have this info either:

 

RobertatIDTasker_0-1726483923876.png

 

Checked RTF spec - nothing about special glyphs or alternative forms...

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

I want to create the required list for any perticular document (a PDF or MS Word doc or an InDesign doc), whatever method is used. Till yet no success.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

quote

I want to create the required list for any perticular document (a PDF or MS Word doc or an InDesign doc), whatever method is used. Till yet no success.


By @zuhair777

 

Looks like only what @Dirk Becker suggested would work - dedicated, real plug-in.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

Spending some more time, there is actually an attribute to force a positional form.

This attribute is also exposed as text property. Quick look at the OMV:

 

DirkBecker_0-1726481390618.png

 

I guess / did not verify, that "calculate" would translate to "Automatische Form" above, meaning the composer makes the choice, and the font contributes with some additional substitution magic along those alternatives for the specified form as described by @zuhair777 for the initial shapes of Baa.

 

When a glyph is chosen from the glyphs panel, InDesign has an order of ways to reach the glyph. Preferred is by unicode character, then with various concrete feature related attributes, then the more abstract SALT or AALT lookup, eventually other general opentype features not backed by attributes. Plus likely some other way that I forgot. Ah yes, fonts can also specify different glyphs per language and/or writing script. Bulgarian cyrillic might look different than Russian.

Final fallback is the numeric glyph ID. The direct glyph ID may change across fonts and even font versions, this makes it the final resort.

The actual choice of the glyphs panel is shown in the tip when you mouse-over the glyph.

I have not yet tried how FindChange Glyph works, likely similar. So iterating thru all glyph IDs with findGlyph() might already give you your count. Note that in theory fonts need not have consecutive glyph IDs, so better you try them all.

 

Anyway, I understand that e.g. for a glyph coverage overview when trying a font with a test text, one would want to see all used glyphs. Eventually also the unused ones, glyph name, and the preferred choice, etc. I'll try to add something like that to my plug-in while you try to figure out how to upload an exampe document 😉 .

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

Here is a sample IDML attached. The list should not only include base glyphs but it should also include the Mark glyphs present in the text i.e. each and every glyph present in the text should be included and counted. I think a PDF created with a subset font included in it would be a better source for extracting this data list than InDesign document. What would experts suggest, experimenting with an InDesign document or a PDF?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

IDML is better in this forum. One could also place a PDF and try to make sense out of that – at least font usage should still list the fonts – but it would be more effort to get down to the glyphs if possible at all.

 

You could open another thread as challenge in an Acrobat forum and also mention this thread.

 

"Traditional Arabic Regular" is missing on my Mac. When I replace it with "Adobe Arabic Regular", I get below list (shortened, see attached file). There are also some .notdef that I omitted …

1 'space' 1285

2 'uni0621' 8

...

125 'uniFB51' 12

574 'uni0640.narrow' 1

576 'uni064B' 6

577 'uni064C' 7

...

619 'uniFC6A' 61

620 'uni06280645.init' 2

624 'uniFC70' 7

625 'uni062A0645.init' 3

629 'uni062B0631.fina' 17

...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

Yes, it appears to be the same result that I was looking for. Can I get your script?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 16, 2024 Sep 16, 2024

Copy link to clipboard

Copied

1000x thanks for your input, Dirk. I especially appreciate your description of things accessible to plugin developers that we don't see exposed in ExtendScript. Where in the world does one learn of "wax"? In C++ developer documentation, I imagine? 

 

At any rate, when I was talking about "forms" I was posting mostly for Robert's benefit, because he was looking in the JS documentation for ways to get at this question via scripting, and the link he posted to indesignjs.de was I imagine found by searching the JS docs for "glyph" which finds things specifically related only to Unihan forms: 

 

alt.png

 

RIght? Your commentary on the definitions of the word "form" is right on the money, but here, where we only see halfwidth and fullwidths "forms" and JIS standards? That can only mean CJK typesetting; there's nothing in this particular corner of the JS documentation about initial, medial, final, or isolate forms. Clearly the positionalForm is what we would have been looking for - we'd never find that in the JS docs looking for "glyphs". 

 

All that being said, I think that the script that @Marc Autret posted a link to might be the fastest way to actually create this list that the OP is looking for. It already generates a complete glyph list from glyph ID, shouldn't be hard to get it to add each glyph ID to its generated list.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines