Copy link to clipboard
Copied
I am trying to make a script to list all the glyphs by their names or IDs and their counts in a document. The glyphs should be listed by their names or IDs (not by Unicode) as each Unicode character has several alternate shapes and I want to get result for each and every alternate shape. I have tried but in vain. Here is my script that I am working on.
// Get the active document
var doc = app.activeDocument;
// Define an object to store the glyph name counts
var glyphCounts = {};
// Define the output file path
var outputFile = File("~/Desktop/GlyphCount.txt");
outputFile.open("w");
// Write the initial line to the file
outputFile.writeln("Glyph Counts for the font: Arabic Font 1");
// Function to process characters and count glyphs
function countGlyphs(character) {
try {
// Check if the character has glyphs
if (character.hasOwnProperty('glyphs') && character.glyphs.length > 0) {
var glyph = character.glyphs[0]; // Get the first glyph object
var glyphName = glyph.glyphName; // Get the glyph name
// Use a placeholder if glyphName is undefined
if (!glyphName) {
glyphName = "UnknownGlyph";
}
// Update the glyph count based on the glyph name
if (glyphCounts[glyphName] === undefined) {
glyphCounts[glyphName] = 1;
} else {
glyphCounts[glyphName]++;
}
// Debug output using alerts
alert("Glyph: " + glyphName + ", Count: " + glyphCounts[glyphName]);
} else {
alert("No glyphs found for character.");
}
} catch (e) {
// Log errors if glyph retrieval fails
alert("Error processing character. Error: " + e.toString());
}
}
// Loop through all stories (text frames) in the document
for (var i = 0; i < doc.stories.length; i++) {
var story = doc.stories[i];
// Loop through all characters in the story
for (var j = 0; j < story.characters.length; j++) {
var character = story.characters[j];
var font = character.appliedFont.name;
// Check if the character is from the specific font you're interested in
if (font === "Arabic Font 1") {
countGlyphs(character);
}
}
}
// Write the glyph counts to the file
for (var glyphName in glyphCounts) {
outputFile.writeln("Glyph: " + glyphName + ", Count: " + glyphCounts[glyphName]);
}
// Close the file
outputFile.close();
// Notify the user
alert("Glyph count saved to " + outputFile.fsName);
Any help is greatly appreciated.
Copy link to clipboard
Copied
I'm not good enough at scripting to be able to help you, but would test the face book scripting group https://www.facebook.com/groups/indesignscripting/
Copy link to clipboard
Copied
Character doesn't have glyphs collection - only glyphForm and:
https://www.indesignjs.de/extendscriptAPI/indesign-latest/#AlternateGlyphForms.html#d1e85227
Copy link to clipboard
Copied
Same text - with Swash Alternates ON on the left and OFF on the right:
returns the same value for the Glyph Form:
And as per link provided earlier:
AlternateGlyphForms.NONE |
Does not use an alternate form. |
1852796517 |
Could you please provide a sample INDD / IDML file for testing?
Copy link to clipboard
Copied
I appreciate your information. If we create a PDF from InDesign, can we get the list from it.
Copy link to clipboard
Copied
I appreciate your information. If we create a PDF from InDesign, can we get the list from it.
By @zuhair777
No idea - can you provide sample INDD / IDML file for testing?
Copy link to clipboard
Copied
Where can I send you the IDML file?
Copy link to clipboard
Copied
Please ZIP it and upload somewhere - WeTransfer or something - and send me link.
Copy link to clipboard
Copied
Looks like this Glyph Form is read/write - so it's rather set-able in the UI - but I can't find it anywhere?
@James Gifford—NitroPress , @Joel Cherney , @Peter Spier @Peter Kahrel - can you pitch in?
Copy link to clipboard
Copied
Afraid this is well outside my eareas of expertise. Sorry.
Copy link to clipboard
Copied
Don't know how much I can help, really. All I can spot here are incompatibilities in terminology, to be honest.
"Glyph forms" and "Alternate Glyph Forms" aren't general typographical terms. I'm, um, 85% certain that they apply only to Chinese and Japanese characters. It's related to Han Unification, and so that is most likely why there's nothing in your UI about it, and nothing in my Middle East/North Africa UI, either.
I think, @zuhair777 , that what you actually want is a list of every glyph you can see in the Glyphs panel, right? Where you can see three different glyphs that all have the same Unicode value but have distinct GIDs, right? That what I understand, when I read this:
The glyphs should be listed by their names or IDs (not by Unicode) as each Unicode character has several alternate shapes and I want to get result for each and every alternate shape.
Here's a thread that may help you out - it's one where Peter Kahrel and Jongware and Dave Saunders (!!!) are all chatting about this issue. Dave's solution is, as he says, ugly, but it might work in your case.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Left is text copied from Acrobat - right - from InDesign:
And this is text copied from InDesign to WORD - saved as DOCX:
saved as RTF:
RTF exported from Acrobat is rubbish.
If there are significant differences - between what is displayed in InDesign and copied to WORD - then macro in WORD could probably do it.
Copy link to clipboard
Copied
The composer produces a sequence of character codes and associated opentype features.
Those opentype features can get specified by dedicated attributes - e.g. stylistic set, nominator/denominator, slashed zero and umpteen others, or they can get specified out of context: positional form e.g. first/middle/last character of a word. There is also an attribute to directly invoke any opentype feature.
I think that the "form" in question is here referring to the positional form.
https://learn.microsoft.com/en-us/typography/opentype/spec/features_fj#init
https://learn.microsoft.com/en-us/typography/opentype/spec/features_fj#tag-fina
The term "form" is also used for other meanings, e.g. "capital form".
Joel might be thinking of leading hangul forms?
https://learn.microsoft.com/en-us/typography/opentype/spec/features_ko#tag-ljmo
The composer then passes those characters and OT features to the font program. This translates them into a sequence of glyph IDs, plus transformations (as in positioning and scaling matrix).
There is no direct mapping - a character can produce multiple glyphs, or multiple characters may result in a single glyph e.g. "st" ligatures, fractions "1/2". Superscript may produce a different glyph or may get fauxed by scaling.
The output of the composer is stored in a family of data structures called wax. These are not accessible to scripting, you'd need a C++ plug-in to report on them. This wax is what keeps e.g. old CS2 documents in their original state until you start editing causing a partial or full recompose.
Sharing examples in DM rather than posting to the forum limits their use. E.g. good to him when Robert can have a look at a transient share but that keeps out others that may come along far in the future. I'll take an example from a different thread instead - the recent one about Kashidas.
The screenshot below is from a plug-in, it shows the different glyphs used for the positional form opentype features, for the first word. It still uses left-to-right, haven't yet added r2l for arabic, for lack of examples when I wrote the plug-in. Labels are localized to German, should be still understandable.
Copy link to clipboard
Copied
In Arabic, most of the letters have four shapes or four forms (1) isolated, (2) initial, (3) medial and (4) final as shown in the image. Arabic script is written from right to left.
The letter Baa (with a dot below) ب has a Unicode value but its other forms don’t have any Unicode values. These shapes automatically change with the the preceding and succeeding combining letters. In more complex Arabic fonts even a single letter may have a large number of initial shapes for a single Unicode character depending on the succeeding joining letters after it as shown in the image below. These are all alternate initial shapes for a single character Baa (with a dot below).
Below are different alternate medial shapes for the same letter Baa (with a dot below).
By "glyph id" I mean the numerical index of these particular glyphs in a particular font. So I am trying to create a script to list all the glyphs, by their IDs, and their counts in a perticular document .
Copy link to clipboard
Copied
How about export to IDML - or even snippet?
There should be info which glyph / form should be used / displayed - unless, it's done when text is rendered?
Copy link to clipboard
Copied
Unfortunately, there is no glyph info in the IDML nor IDMS:
Copy link to clipboard
Copied
RTF doesn't have this info either:
Checked RTF spec - nothing about special glyphs or alternative forms...
Copy link to clipboard
Copied
I want to create the required list for any perticular document (a PDF or MS Word doc or an InDesign doc), whatever method is used. Till yet no success.
Copy link to clipboard
Copied
I want to create the required list for any perticular document (a PDF or MS Word doc or an InDesign doc), whatever method is used. Till yet no success.
By @zuhair777
Looks like only what @Dirk Becker suggested would work - dedicated, real plug-in.
Copy link to clipboard
Copied
Spending some more time, there is actually an attribute to force a positional form.
This attribute is also exposed as text property. Quick look at the OMV:
I guess / did not verify, that "calculate" would translate to "Automatische Form" above, meaning the composer makes the choice, and the font contributes with some additional substitution magic along those alternatives for the specified form as described by @zuhair777 for the initial shapes of Baa.
When a glyph is chosen from the glyphs panel, InDesign has an order of ways to reach the glyph. Preferred is by unicode character, then with various concrete feature related attributes, then the more abstract SALT or AALT lookup, eventually other general opentype features not backed by attributes. Plus likely some other way that I forgot. Ah yes, fonts can also specify different glyphs per language and/or writing script. Bulgarian cyrillic might look different than Russian.
Final fallback is the numeric glyph ID. The direct glyph ID may change across fonts and even font versions, this makes it the final resort.
The actual choice of the glyphs panel is shown in the tip when you mouse-over the glyph.
I have not yet tried how FindChange Glyph works, likely similar. So iterating thru all glyph IDs with findGlyph() might already give you your count. Note that in theory fonts need not have consecutive glyph IDs, so better you try them all.
Anyway, I understand that e.g. for a glyph coverage overview when trying a font with a test text, one would want to see all used glyphs. Eventually also the unused ones, glyph name, and the preferred choice, etc. I'll try to add something like that to my plug-in while you try to figure out how to upload an exampe document 😉 .
Copy link to clipboard
Copied
Here is a sample IDML attached. The list should not only include base glyphs but it should also include the Mark glyphs present in the text i.e. each and every glyph present in the text should be included and counted. I think a PDF created with a subset font included in it would be a better source for extracting this data list than InDesign document. What would experts suggest, experimenting with an InDesign document or a PDF?
Copy link to clipboard
Copied
IDML is better in this forum. One could also place a PDF and try to make sense out of that – at least font usage should still list the fonts – but it would be more effort to get down to the glyphs if possible at all.
You could open another thread as challenge in an Acrobat forum and also mention this thread.
"Traditional Arabic Regular" is missing on my Mac. When I replace it with "Adobe Arabic Regular", I get below list (shortened, see attached file). There are also some .notdef that I omitted …
1 'space' 1285
2 'uni0621' 8
...
125 'uniFB51' 12
574 'uni0640.narrow' 1
576 'uni064B' 6
577 'uni064C' 7
...
619 'uniFC6A' 61
620 'uni06280645.init' 2
624 'uniFC70' 7
625 'uni062A0645.init' 3
629 'uni062B0631.fina' 17
...
Copy link to clipboard
Copied
Yes, it appears to be the same result that I was looking for. Can I get your script?
Copy link to clipboard
Copied
1000x thanks for your input, Dirk. I especially appreciate your description of things accessible to plugin developers that we don't see exposed in ExtendScript. Where in the world does one learn of "wax"? In C++ developer documentation, I imagine?
At any rate, when I was talking about "forms" I was posting mostly for Robert's benefit, because he was looking in the JS documentation for ways to get at this question via scripting, and the link he posted to indesignjs.de was I imagine found by searching the JS docs for "glyph" which finds things specifically related only to Unihan forms:
RIght? Your commentary on the definitions of the word "form" is right on the money, but here, where we only see halfwidth and fullwidths "forms" and JIS standards? That can only mean CJK typesetting; there's nothing in this particular corner of the JS documentation about initial, medial, final, or isolate forms. Clearly the positionalForm is what we would have been looking for - we'd never find that in the JS docs looking for "glyphs".
All that being said, I think that the script that @Marc Autret posted a link to might be the fastest way to actually create this list that the OP is looking for. It already generates a complete glyph list from glyph ID, shouldn't be hard to get it to add each glyph ID to its generated list.