Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Compare text using script or grep

Enthusiast ,
Feb 15, 2023 Feb 15, 2023

Hi 

I have a huge txt file where there are lines of Quran separated by soft return. I want to use this file as a base file as this is spell checked. I have attached the screenshot.

 

What I want to achieve is that whenever anyone types a line of Quran in InDesign, it should be matched with the line in the the base txt file.

 

Can it be done through grep? Is there any script which can do this ?


Thanks

TOPICS
How to , Scripting
5.9K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 2 Correct answers

Community Expert , Feb 27, 2023 Feb 27, 2023

@Bedazzled532, you've made a terrific start! Next you will need to get the story you want to check, so in my example below I've just got the story that the you have selected. I compare it paragraph-by-paragraph with the masterContentFile, and apply a characterStyle "Bad" if it doesn't match.

 

function main() {

    var masterContentFile = File("d:/readid.txt");
    var badCharacterStyleName = 'Bad';

    if (!masterContentFile.exists) {
        alert('Could not find master content file "' + mas
...
Translate
Community Expert , Feb 27, 2023 Feb 27, 2023

Hi @Bedazzled532, here is a version that checks word by word:

 

function main() {

    var masterContentFile = File("d:/readid.txt");
    var badCharacterStyleName = 'Bad';

    if (!masterContentFile.exists) {
        alert('Could not find master content file "' + masterContentFile + '".');
        return;
    }

    var doc = app.activeDocument,
        badCharacterStyle = doc.characterStyles.itemByName(badCharacterStyleName);

    if (!badCharacterStyle.isValid) {
        alert('Could not fin
...
Translate
Enthusiast ,
Feb 27, 2023 Feb 27, 2023

@m1b  I was thinking of one more thing. Now that this line comparing version is working, can we check word by word  using grep in script ? New line and spaces need to be ignored. e.g if there are two spaces in between the word in indesign, current script will treat this as an error. Is there a way out of this ?

Thanks

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 27, 2023 Feb 27, 2023

Hi @Bedazzled532, here is a version that checks word by word:

 

function main() {

    var masterContentFile = File("d:/readid.txt");
    var badCharacterStyleName = 'Bad';

    if (!masterContentFile.exists) {
        alert('Could not find master content file "' + masterContentFile + '".');
        return;
    }

    var doc = app.activeDocument,
        badCharacterStyle = doc.characterStyles.itemByName(badCharacterStyleName);

    if (!badCharacterStyle.isValid) {
        alert('Could not find character style "' + badCharacterStyleName + '".');
        return;
    }

    if (
        doc.selection[0] == undefined
        || !doc.selection[0].hasOwnProperty('parentStory')
    ) {
        alert('Please put cursor in the story you want to check and try again.');
        return;
    }

    masterContentFile.open('r')

    var masterContent = masterContentFile.read().split("\n"),
        userParagraphs = doc.selection[0].parentStory.paragraphs,
        userContent = userParagraphs.everyItem().contents,
        leadingTrailingSpace = /(^\s|\s$)/g,
        whitespace = /\s+/g,
        contentCount = Math.min(userContent.length, masterContent.length),
        differenceCount = 0;

    for (var i = 0; i < contentCount; i++) {

        var m = masterContent[i].replace(leadingTrailingSpace, ''),
            u = userContent[i].replace(leadingTrailingSpace, '');

        if (u == m) continue;

        var mWordContent = m.split(whitespace),
            uWords = userParagraphs[i].words,
            uWordContent = uWords.everyItem().contents,
            wordCount = Math.min(mWordContent.length, uWordContent.length);

        for (var j = 0; j < wordCount; j++) {

            if (uWordContent[j] != mWordContent[j]) {

                uWords[j].applyCharacterStyle(badCharacterStyle);
                differenceCount++
            }

        }

    }

    alert('Found ' differenceCount + ' different words.');

};

app.doScript(main, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, 'Check Story Against Master Content');

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 28, 2023 Feb 28, 2023

@m1b Thanks it worked. Its breaking on new line (\n) but its working fine with soft enter. Its also taking into consideration the whitespaces. Thats gr8.

Wonderful modificaiton. Thanks

 

I would like to understand following step :

contentCount = Math.min(userContent.length, masterContent.length)
Its giving the minimum length here. But I do not understand why we are using this "minimum"?

Thanks.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 28, 2023 Feb 28, 2023

I would like to understand following step :

contentCount = Math.min(userContent.length, masterContent.length)
Its giving the minimum length here. But I do not understand why we are using this "minimum"?

 

This sets the loop count to the lowest number of user content or master content. Imagine if there are 10 lines in the master content file, but only 8 lines in the user content (the story)... if we use the master content line length as the loop, we will run out of user content to compare and an error will occur (userContent[8] will be undefined, not a string). If we set the loop to stop when it reaches the minimum of those two counts, then we can't go wrong.

- Mark

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 28, 2023 Feb 28, 2023

Thank you so much.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 28, 2023 Feb 28, 2023
LATEST

You're welcome! 🙂 All the best with your project.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines