Skip to main content
Bedazzled532
Inspiring
February 15, 2023
Answered

Compare text using script or grep

  • February 15, 2023
  • 4 replies
  • 5261 views

Hi 

I have a huge txt file where there are lines of Quran separated by soft return. I want to use this file as a base file as this is spell checked. I have attached the screenshot.

 

What I want to achieve is that whenever anyone types a line of Quran in InDesign, it should be matched with the line in the the base txt file.

 

Can it be done through grep? Is there any script which can do this ?


Thanks

This topic has been closed for replies.
Correct answer m1b

@m1b  I was thinking of one more thing. Now that this line comparing version is working, can we check word by word  using grep in script ? New line and spaces need to be ignored. e.g if there are two spaces in between the word in indesign, current script will treat this as an error. Is there a way out of this ?

Thanks


Hi @Bedazzled532, here is a version that checks word by word:

 

function main() {

    var masterContentFile = File("d:/readid.txt");
    var badCharacterStyleName = 'Bad';

    if (!masterContentFile.exists) {
        alert('Could not find master content file "' + masterContentFile + '".');
        return;
    }

    var doc = app.activeDocument,
        badCharacterStyle = doc.characterStyles.itemByName(badCharacterStyleName);

    if (!badCharacterStyle.isValid) {
        alert('Could not find character style "' + badCharacterStyleName + '".');
        return;
    }

    if (
        doc.selection[0] == undefined
        || !doc.selection[0].hasOwnProperty('parentStory')
    ) {
        alert('Please put cursor in the story you want to check and try again.');
        return;
    }

    masterContentFile.open('r')

    var masterContent = masterContentFile.read().split("\n"),
        userParagraphs = doc.selection[0].parentStory.paragraphs,
        userContent = userParagraphs.everyItem().contents,
        leadingTrailingSpace = /(^\s|\s$)/g,
        whitespace = /\s+/g,
        contentCount = Math.min(userContent.length, masterContent.length),
        differenceCount = 0;

    for (var i = 0; i < contentCount; i++) {

        var m = masterContent[i].replace(leadingTrailingSpace, ''),
            u = userContent[i].replace(leadingTrailingSpace, '');

        if (u == m) continue;

        var mWordContent = m.split(whitespace),
            uWords = userParagraphs[i].words,
            uWordContent = uWords.everyItem().contents,
            wordCount = Math.min(mWordContent.length, uWordContent.length);

        for (var j = 0; j < wordCount; j++) {

            if (uWordContent[j] != mWordContent[j]) {

                uWords[j].applyCharacterStyle(badCharacterStyle);
                differenceCount++
            }

        }

    }

    alert('Found ' differenceCount + ' different words.');

};

app.doScript(main, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, 'Check Story Against Master Content');

 

4 replies

Robert at ID-Tasker
Legend
February 18, 2023

How about exporting Story(ies) as plain text or RTF, then sorting and comparing in WORD? 

 

Not fully automated but if you won't have too many errors - it should be quick? 

 

James Gifford—NitroPress
Legend
February 18, 2023

I suggested that a ways back, and for a one-shot, labor-intensive effort it still seems to be a viable approach. But I get the idea the OP needs this on a more continuing basis, something a little more integrated into a writing and publishing  workflow.

 

And I believe the goal, at pretty much any cost, is *zero* errors.

 

Bedazzled532
Inspiring
February 27, 2023

I am trying this following script but for some reason it is not working. 

The logic is to read one line from the original txt file (database), then read one line from the text frame,

compare the lines, if match then well and good, if does not match apply a char style of color red to that line.

 

I am not very good at writing scripts but this is what i have come up with. Any help would be appreciated.

Thanks.

 

//Read from file
file = File("d:/readid.txt");
file.open("r");
var content = file.read().split("\n");

for (var i = 0; i < content.length ; i++)
{
var orig = content[i];

//alert(content.length);
//Read from text frame
app.findGrepPreferences=app.changeGrepPreferences=null;
app.findGrepPreferences.findWhat=".+";
p = app.activeDocument.findGrep();
//alert(p.length);
for (var i = p.length-1; i >= 0; i--)
{
var newln = p[i].lines[0].contents;

if(orig === newln){
alert("same");
}
else{
alert("not same");
break;
}

}

}

 

 

 

 

m1b
Community Expert
Community Expert
February 15, 2023

Hi @Bedazzled532, like Joel I am wondering what you are exactly trying to achieve. If you are trying to catch errors typed quotes from the Quran, would it be better to have a script just enter the text from the master data file. The user choose the chapter and verse and the script would look it up and insert it at the insertion point. That should be feasible. Otherwise you are talking about a quite sophisticated system that will require considerable development I think.

- Mark

Bedazzled532
Inspiring
February 16, 2023

Thanks m1b

I understand that what I want is complicated but I am just a beginner in scripting so I wanted to do just comparision line by line. I have to struggle even writing this simple script. Just in a learning phase.

Thanks

Joel Cherney
Community Expert
Community Expert
February 15, 2023

So, I did set up somethig like this for a client, a few years ago. I don't think it'll work for you, but it's worth asking a few questions. What they wanted was, anytime anyone typed a phrase that appeared in their list of phrases, they wanted the person keying it to be informed. Their list had maybe fifteen phrases on it? So I set up fifteen-or-so GREP styles, to automatically apply highlighting to any second appearance of that phrase. 

 

Is that what you were trying to ask for? Because it wouldn't work for you, as the number of lines in the Quran is rather larger than 15, and even just 15 GREP Styles running on a medium size document was a decent performance hit. 

 

Like James, I see using other tools or environments as the best way to achieve what I think you're trying to do, but I'm still not certain that I understand what it is that you're after.  Are you trying to help users quote the Quran in InDesign? Find line numbers, maybe?

James Gifford—NitroPress
Legend
February 15, 2023

I'm thinking I answered a slightly different version of the OP. The need is not quite for file comparison, but something of a database lookup.

 

I think this would be a very, very complex task to embed/automate, even with some kind of script/SQL interface to that complete reference file. Almost something like a specialized word processing system in itself.

 

Robert at ID-Tasker
Legend
February 18, 2023

Easy to do on a PC in VB 😉 

 

James Gifford—NitroPress
Legend
February 15, 2023

In my experience, Word is excellent at doing file comparisons. If this is truly a 'text' file, not one necessarily formatted in InDesign, you might consider using the better tool rather than trying to adapt ID's capabilities.

 

Word's approach is also full-page. and allows instant corrections. I can't quite imagine an ID script handling differences in any way but one at a time, which would be very, very tedious to process.

 

The other standard approach, if both documents are in InDesign, is to export them to PDF and use Acrobat DC to compare them. But that makes no provision for corrections; I still see Word as the right (and pretty good) tool for a job like this.

 

Bedazzled532
Inspiring
February 16, 2023

Hello James

 

Thanks so much for the reply.

Actually you are right that Word has better options for this but I am working totally on InDesign and have little or no idea of doing page setups in Word. So I would prefer doing this in InDesign. If nothing works then I will have to do spell check in Word then again cope paste/import in Indesign.

Thanks