Bespoke Hyphenation Library

Report · Jan 08, 2024

Just wondered whether you could create your own hyphenation library for Adobe Illustrator.

I have a list of 100 words, all surrounding ingredients for cosmetics, and they are needed (if required) to break at certain points within the words.

Report · Jan 08, 2024

Basically, you can create an exception list in the Hyphenation section of the application preferences (which is probably not what you want).

Apart from that, you can check the Hyphenation settings in the flyout menu of the Paragraph palette.

If that doesn't help, you will have to do it manually.

Report · Jan 09, 2024

Thanks for taking the time to reply Kurt.

Manual is not an option I am afraid,

Steve

Report · Jan 09, 2024

You're going to have a better time if you do this in InDesign:

https://creativepro.com/tip-week-controlling-hyphenation/

Report · Jan 09, 2024

Thanks for taking the time to reply Doug, appreciate it.

I agree with you, but I am afraid the client is wanting to remain in Illustrator,

Steve

Report · Jan 11, 2024

Hi @Steven34668211ijr2, I had a think about this and did some testing, but unfortunately Illustrator is not only "unsophisticated" when it comes to hypenation, but it doesn't support even basic things, like honoring a discretionary hyphen, or even breaking at a zero-width-space. This makes it very difficult to come up with a good solution for your case.

I think a script could handle it. You would create a text file containing all the words with markers for the hyphenation points. The logic is actually quite involved due to Illustrator somewhat quirky text handling, and so the script would be a fair bit of work. And you would have to re-run the script every time the line breaks changed! You'll have to tell me if that's something that you are still interested in I guess.

- Mark

Report · Jan 11, 2024

Hi Mark,

Thanks for the reply, really appreciated.

You’re right, I had resorted to the mindset that a script would be the only solution.

Luckily, the requirements are on a single text box and only have 80/120 words that need to be captured.

I am using scripting to import and style the text in to Illustrator in the first instance, hence my requirement to prevent this being a manual job.

As soon as I can today I think I will start looking in to creating this script,

Thank you

Steve

Report · Jan 13, 2024

Hi Steve, I was intriged by this challenge and decided to write a script to do it. I did not know you are a keen scripter, and you probably have a better approach than mine, but I will post my script here for what it's worth. Only read it if you don't want the challenge of solving for yourself! 🙂 Oh and of course it hasn't been tested much so will definitely break for some cases.

- Mark

If you decide to test it out, here's how to set it up:

1. Put the two script files together into a folder

- "Manually Hyphenate Selected Text.js" (first script below)

- "Hyphenator.js" (second listing below)

2. Write out your word list and save as "wordlist.txt" in the same folder (see example below).

3. Select some text in Illustrator and run "Manually Hyphenate Selected Text.js" script.

Here are screen grabs of my example. Note that running the script again, will first remove the previous hyphenation.

Script file 1:

/**
 * Performs "manual" hyphenation on selected text(s),
 * using custom word list.
 * Word list contains one word per line, and includes
 * a hyphen at every viable position, for example:
 * -----------------------------------
 *    methyl-sulfonyl-methane
 *    phyto-sphingo-sine
 *    tri-hydroxy-stearin
 * -----------------------------------
 * @File Manually Hyphenate Selected Text.js
 * @author m1b
 * @discussion https://community.adobe.com/t5/illustrator-discussions/bespoke-hyphenation-library/m-p/14342552
 */
//@include 'Hyphenator.js'
(function () {

    var words = loadLinesFromFile('wordlist.txt');

    var hyphenator = new Hyphenator({ words: words }),
        doc = app.activeDocument,
        sel = doc.selection,
        textsToHyphenate = getTexts(sel);

    // hyphenate all the texts
    for (var i = 0; i < textsToHyphenate.length; i++)
        hyphenator.hyphenateText(textsToHyphenate[i]);

})();


/**
 * Get array of text ranges from the supplied item(s).
 * @author m1b
 * @version 2024-01-09
 * @Param {TextRange|TextFrame|Array|GroupItem} item - the item or items to get text from.
 * @Returns {?Array<TextRange}
 */
function getTexts(item) {

    var texts = [];

    if (item == undefined)
        return;

    if (item.hasOwnProperty('baselineShift'))
        // already text
        texts.push(item);

    else if (
        item.constructor.name == 'Story'
        || item.constructor.name == 'TextFrame'
    )
        // almost text
        texts.push(item.textRange);

    else if (
        (
            // to bypass bug in TextFrames object:
            item.typename
            && item.typename == 'TextFrames'
        )
        || item.constructor.name == 'Stories'
        || item.constructor.name == 'Array'
    )
        // handle array of text frames
        for (var j = item.length - 1; j >= 0; j--)
            texts = texts.concat(getTexts(item[j]) || []);

    else if (item.constructor.name == 'GroupItem')
        // all text in group
        for (var j = item.pageItems.length - 1; j >= 0; j--)
            texts = texts.concat(getTexts(item.pageItems[j]) || []);

    else if (item.constructor.name == 'Document')
        // all text in document
        texts = texts.concat(getTexts(item.stories));

    return texts;

};


/**
 * Returns array of strings by reading
 * each line of text file.
 * @Param {String} nameOrPath - the path or file name of the target file.
 * @Returns {Array<String>}
 */
function loadLinesFromFile(nameOrPath) {

    if (!/\/\\/.test(nameOrPath))
        nameOrPath = File($.fileName).parent + '/' + nameOrPath;

    var f = File(nameOrPath);

    if (!File(nameOrPath).exists)
        return alert('Error: could not find word list at "' + nameOrPath + '"');

    f.open('r');

    return f.read().split(/[\n\r]/g);

};

Script File 2:

/**
 * @File Hypenator.js
 * @author m1b
 * Helper objects:
 *   - Hyphenator
 *   - HyphenatedWord
 *
 * Public methods of Hyphenator:
 *   - removePreviousHyphenation
 *   - hyphenateText
 */

/**
 * A hyphenation helper object.
 * @author m1b
 * @version 2024-01-13
 * @constructor
 * @Param {Object} options
 * @Param {Array<String>} options.words - the hypenated words to use.
 */
function Hyphenator(options) {

    options = options || {};

    var self = this;
    // the hyphen character
    self.hyphen = options.hyphen || '-';
    // a character to break a line
    self.breaker = options.breaker || ' '; // ordinary space for now
    // the word list
    self.words = options.words || [];

    for (var i = 0; i < self.words.length; i++)
        if ('String' === self.words[i].constructor.name)
            self.words[i] = new HyphenatedWord(self.words[i], self);

};


/**
 * Removes manual hyphenations
 * by searching for hypen/breaks.
 * @author m1b
 * @version 2024-01-13
 * @Param {TextRange} text - an Illutrator TextRange.
 */
Hyphenator.prototype.removePreviousHyphenation = function removePreviousHyphenation(text) {

    var self = this,
        contents = text.contents,
        matcher = new RegExp(self.hyphen + self.breaker, 'g'),
        match,
        found = [],
        removeMe;

    matcher.lastIndex = 0;

    while (match = matcher.exec(contents))
        found.push(match.index);

    // process each found, going backwards
    while (removeMe = found.pop()) {
        text.characters[removeMe].remove();
        text.characters[removeMe].remove();
    }

};


/**
 * Finds the first word in the given text,
 * and returns an object containing the word
 * and its start index.
 * If a `from` number is supplied, the search
 * will start from there.
 * @author m1b
 * @version 2024-01-13
 * @Param {TextRange} text - an Illustrator Text Range.
 * @Param {Number} [from] - start searching from this character index (default: 0).
 * @Returns {Object} - {word: HypenatedWord, start: Number}
 */
Hyphenator.prototype.getFirstWord = function getFirstWord(text, from) {

    var self = this,
        contents = text.contents.slice(from || 0),
        firstWord = {
            start: Infinity,
            word: undefined
        };

    for (var i = 0, start; i < self.words.length; i++) {

        start = contents.search(new RegExp('\\b' + self.words[i].word + '\\b', 'i'));

        if (
            -1 !== start
            && start < firstWord.start
        ) {
            firstWord.start = start;
            firstWord.word = self.words[i];
        }

    }

    if (undefined !== firstWord.word) {
        // update offset
        firstWord.start += from;
        return firstWord;
    }

};


/**
 * Perform the hypenation on the given text.
 * @author m1b
 * @version 2024-01-13
 * @Param {TextRange} text - an Illustrator TextRange.
 */
Hyphenator.prototype.hyphenateText = function hyphenateText(text) {

    var self = this;

    if (
        !text.hasOwnProperty('contents')
        || 0 === text.contents.length
    )
        return;

    self.removePreviousHyphenation(text);

    var currentWord,
        advance = 0,
        currentLineID,
        breaks = [];

    wordsLoop:
    while (currentWord = self.getFirstWord(text, advance)) {

        advance = currentWord.start;

        partsLoop:
        for (var i = 1, removeMe; i < currentWord.word.parts.length; i++) {

            // add the length of the preceding part
            advance += currentWord.word.parts[i - 1].contents.length;

            // the lineID is used to test whether linebreaks shifted
            currentLineID = getLineID(text.characters[advance]);

            // add the hypen/breaker to the contents of the character before
            text.characters[advance - 1].contents += self.hyphen + self.breaker;
            breaks.push(advance)
            advance += 2;

            // track the hyphen in
            text.characters[advance - 1].tracking = -250;

            // sanity check
            if (0 === text.characters[text.characters.length - 1].lines.length)
                return alert('Warning: hyphenating failed due to overset text.');

            // check if the line shifted
            if (getLineID(text.characters[advance - 2]) < currentLineID) {

                // the break has caused the line
                // to jump up to the _previous_ line
                currentLineID = getLineID(text.characters[advance - 2]);

                // see if previous breaks need removing
                breaksLoop:
                for (var j = 0, offset = 0; j < breaks.length; j++) {

                    // offset takes removals into account
                    breaks[j] += offset;

                    if (j >= breaks.length - 1)
                        // we never remove the most recent break
                        break breaksLoop;

                    if (getLineID(text.characters[breaks[j]]) === currentLineID) {

                        // remove this obsolete break
                        text.characters[breaks[j]].remove();
                        text.characters[breaks[j]].remove();
                        breaks.splice(j--, 1);

                        // make adjustments
                        offset -= 2;
                        advance -= 2;

                    }

                }

                continue partsLoop;

            }

            else if (getLineID(text.characters[advance + 2]) > currentLineID) {
                // the break is good, because the
                // breaker is at the end of the line
                continue partsLoop;
            }

            else {
                // break not needed, because there is room for more text after it
                removeMe = breaks.pop();
                text.characters[removeMe].remove();
                text.characters[removeMe].remove();
                advance -= 2;
            }

        }

    }

    function getLineID(text) {
        return text.lines[0].start;
    };

};


/**
 * A word object with hyphenation information.
 * @author m1b
 * @version 2024-01-13
 * @constructor
 * @Param {String} hyphenatedWord - the word, maximally hyphenated.
 * @Param {Hyphenator} hyphenator - the hypenator to use.
 */
function HyphenatedWord(hyphenatedWord, hyphenator) {

    var self = this;

    self.hyphenator = hyphenator;
    self.hyphenatedWord = hyphenatedWord;
    self.matchHyphens = new RegExp(hyphenator.hyphen, 'g');
    self.word = hyphenatedWord.replace(self.matchHyphens, '');
    self.length = self.word.length;
    self.parts = [];

    var parts = hyphenatedWord.split(hyphenator.hyphen);

    for (var i = 0, index = 0; i < parts.length; i++, index += parts[i - 1].length)
        self.parts.push({ contents: parts[i], start: index });

};

Report · Jan 30, 2024

Good Morning Mark,

Hope you're well. I am so sorry, I seen this and forgot to reply. This looks really interesting and I really do value the time you have put in to it, thank you.

I had to park this project to finish other elements of work, but I am now going to look at this, and what you have done has really hit the mark (no pun intended).

Thank you ever so much mate,

Steve

Report · Jan 30, 2024

No worries at all, Steve. I hope it works for you, or at least gives you some ideas improve. It was fun to write.

- Mark

Report · Feb 06, 2024

I'd be interested in taking a go at this as well, but for the sake of clarity I would need to know exactly what "break at certain point within the words" means. If you give a list of words or example file this would help greatly, otherwise we're completely in the dark with what the requirements actually are and people like Mark might do a ton of work that isn't on the right track from lack of information.

Bespoke Hyphenation Library

Explore related tutorials & articles