Copy link to clipboard
Copied
Hi
I have a huge txt file where there are lines of Quran separated by soft return. I want to use this file as a base file as this is spell checked. I have attached the screenshot.
What I want to achieve is that whenever anyone types a line of Quran in InDesign, it should be matched with the line in the the base txt file.
Can it be done through grep? Is there any script which can do this ?
Thanks
@Bedazzled532, you've made a terrific start! Next you will need to get the story you want to check, so in my example below I've just got the story that the you have selected. I compare it paragraph-by-paragraph with the masterContentFile, and apply a characterStyle "Bad" if it doesn't match.
function main() {
var masterContentFile = File("d:/readid.txt");
var badCharacterStyleName = 'Bad';
if (!masterContentFile.exists) {
alert('Could not find master content file "' + mas...
Hi @Bedazzled532, here is a version that checks word by word:
function main() {
var masterContentFile = File("d:/readid.txt");
var badCharacterStyleName = 'Bad';
if (!masterContentFile.exists) {
alert('Could not find master content file "' + masterContentFile + '".');
return;
}
var doc = app.activeDocument,
badCharacterStyle = doc.characterStyles.itemByName(badCharacterStyleName);
if (!badCharacterStyle.isValid) {
alert('Could not fin...
Copy link to clipboard
Copied
In my experience, Word is excellent at doing file comparisons. If this is truly a 'text' file, not one necessarily formatted in InDesign, you might consider using the better tool rather than trying to adapt ID's capabilities.
Word's approach is also full-page. and allows instant corrections. I can't quite imagine an ID script handling differences in any way but one at a time, which would be very, very tedious to process.
The other standard approach, if both documents are in InDesign, is to export them to PDF and use Acrobat DC to compare them. But that makes no provision for corrections; I still see Word as the right (and pretty good) tool for a job like this.
Copy link to clipboard
Copied
Hello James
Thanks so much for the reply.
Actually you are right that Word has better options for this but I am working totally on InDesign and have little or no idea of doing page setups in Word. So I would prefer doing this in InDesign. If nothing works then I will have to do spell check in Word then again cope paste/import in Indesign.
Thanks
Copy link to clipboard
Copied
So, I did set up somethig like this for a client, a few years ago. I don't think it'll work for you, but it's worth asking a few questions. What they wanted was, anytime anyone typed a phrase that appeared in their list of phrases, they wanted the person keying it to be informed. Their list had maybe fifteen phrases on it? So I set up fifteen-or-so GREP styles, to automatically apply highlighting to any second appearance of that phrase.
Is that what you were trying to ask for? Because it wouldn't work for you, as the number of lines in the Quran is rather larger than 15, and even just 15 GREP Styles running on a medium size document was a decent performance hit.
Like James, I see using other tools or environments as the best way to achieve what I think you're trying to do, but I'm still not certain that I understand what it is that you're after. Are you trying to help users quote the Quran in InDesign? Find line numbers, maybe?
Copy link to clipboard
Copied
I'm thinking I answered a slightly different version of the OP. The need is not quite for file comparison, but something of a database lookup.
I think this would be a very, very complex task to embed/automate, even with some kind of script/SQL interface to that complete reference file. Almost something like a specialized word processing system in itself.
Copy link to clipboard
Copied
Easy to do on a PC in VB 😉
Copy link to clipboard
Copied
Sure. Just "COPY CON QURANCONVERT.EXE"... and type, type, type away.
This very much could be achieved... but "easy" is not a word I'd use in the project definition.
It does occur to me that it (1) could have lasting commercial value, or (2) might already exist (with, obviously, a different database) in the Bible editing world. Creation or conversion might be worth getting funded.
Copy link to clipboard
Copied
Yes 😉 I was thinking about LIVE comparison - script would monitor what has been just typed and query database - kind of what you can achieve on the phone when you are typing - suggestions.
Copy link to clipboard
Copied
Hi Joel
Thanks for the reply.
Yes I was looking somthing like this. Actually the whole Quran text is divided into 30 books. Each containing approx 20-22 pages. I would like to see what grep solution you provided so that I can implement the same.
Actually there are many variations in Quran Text, like IndoPak, Middle-Eastern and some others. I am working on IndoPak script and I have a spell checked txt version of IndoPak script which I want to use as a database for comparision.
The need for comparision is that after page setup of Al-Quran Text, I would like to run the comparision one final time before printing. Reason being there can be human error while working/page setup. So, I want to compay line by line from the database.
Thanks
Copy link to clipboard
Copied
Ohhhhhh
Now, that makes sense!
Like I said previously, my solution won't work for you. What I used is called a "GREP Style." It's something you set up inside a paragraph style. It basically is constantly running a GREP Find query against everything marked with that paragraph style, and anytime it finds a hit, it marks it with a character style. StackExchange tells me that there are around seven to eight thousand lines in the Quran, depending on how you count 'em. (Does the surah name count? How about bismillah? But it doesn't matter, because...) This means that your paragraph style would need about eight thousand GREP queries running constantly on your entire document, which would probably be impossible. Even if it didn't crash InDesign, I expect it would slow it down to the point of unusability. It's a cool technique, just not for your use case. I'll add a quick animation at the end of my post showing you how it owrks.
What you'd want instead, I think, would be a script that you'd run once, after layout, which would compare the lines in your doc against your master document. That way you can have each of your non-matching lines flagged for your review, without spending a vast amount of resources on constantly searching your entire document. I like m1b's idea a lot, but it's hard to adapt to your case, because I very much doubt that you're in there re-keying the Quran when you have the complete raw text of the document in another file. You're trying to automate post-layout QA review, right?
Anyhow, I'm looking at your sample screenshot, and I'm seeing a few things that confuse me. The character drops (the pink rectangles) are there because you're using Adobe Arabic here, and it doesn't have an end of ayah glyph, yes? Also, the first line doesn't have an end of ayah because it's actually the bismillah, right? I'm asking because we'd need some kind of segmentation to chop up your target document, and end of ayah is something I've used myself to that end, in the past.
Lastly, here's the GREP Style technique that I think would crash your computer. It works fine for a few searches, probably not so well for thousands.
Copy link to clipboard
Copied
First of all, thanks so much for the effort.
Yes you are right. I want a script to flag out the lines which mismatches from the database.
In the screenshot you will see 2 character drops which are showing in pink. Those are the opening and closing Arabic brackets. In some lines you will see that there is only one char drop. That is a special character.
Currently, I just want the line till first opening bracket(excluding the bracket) to be read and then compare it with database. If it mismatches, apply a character style flagging that something is wrong in this line. If the compare result it ok, it will match the second line.
I may include the bracketed matter for comparision in future, but at present, I would like to do it without them.
To summarise, I want the script to :
a. Read a line from the databse (text file saved in a folder)
b. Read a line from the Indesign file I am working on.
c. Compare the lines.
d. If all is OK, proceed to the next line.
e. If all is not OK, flag the line in InDesign file by a character style.
f. (If it flags a particular word, it would be a wonderful solution)
I tried writing a script to read from the text frame but I could not. 😞
Thanks and regards
Shahid
Copy link to clipboard
Copied
Okay, so there's lots of possibilities, here. The first that occurs to me is completely free of scripting effort, and feels like a late 20th century workaround. Back then, software generally didn't have a way to mark text as e.g. Dari or Pashto or Burmese or whatever. So, if I did make a custom dictionary for a language, I'd have to pretend that it was named something else. So our Hmong dictionary was actually coded as "English - Scottish." Hmong translator thought that was funny.
Anyhow. We make you a custom dictionary file. Each "word" in the dictionary is a whole line of your Quran file. So we load it in as a custom dictionary, and make sure that it's not using the actual Hunspell dictionary. Then, if a single word is misspelled, you get the red wavy underline indicating a misspelling. Here's how it could work:
1) Take your Master Quran File, and make it into a raw text file.
a) Remove all of the line numbers from your master. This is actually the only place we'll be using GREP.
I don't know if you are using the decorative brackets or the actual end-of-ayah glyph because they've all dropped, but you can just copy them out of your text and paste them into the "Find what" field. I have two separate clauses to find either one-digit or two-digit verse numbers:
﴿\d﴾|﴿\d\d﴾
You might need to play with this query to get it to capture all of your verse numbers...
b) Use the text tool to select the entire Quran, whack Control-A
c) File -> Export, choose "Text Only" as the file format. Give it a unique name (like QuranCustomDict.txt)
2) Add this custom dictionary to InDesign
a) Edit -> Preferences -> Dictionary -> Arabic
b) Add the dictionary
c) Make sure that Spelling is set to User Dictionary Only
3) Turn on Dynamic Spelling (Edit -> Preferences -> Spelling -> check the Dynamic Spelling box)
4) Go to your layout file that you want to check and ensure that all text is set to Arabic language, if it's not already:
a) Open a Find/Change dialog, go to the GREP tab
b) your "Find what" query is
.+
which means "find everything"
c) leave "Change to" blank
d) in the Change Formatting area, go to Advanced Character Formats and specify Arabic
e) whack that Change All button
Now, if that is all set up correctly, then any divergence from the lines as they appear in your custom dictionary get marked as spelling errors. I have a fatha on the clipboard and am simply pasting it into random words, here:
Copy link to clipboard
Copied
@Joel Cherney Very nice! 🙂
Copy link to clipboard
Copied
@Joel Cherney Wow...Thanks a lot for so much of effort Joel.
I am really appreciate the work you have done.
Actually, this part, I have already done. I have already prepared a Quran Hunspell Dictionary and written a batch script which installs it automatically.
What matters now is the sequence of words. Thats the main reason I wanted to compare it line by line from the database.
Any suggestions on that ?
Thanks once again.
Regards
Shahid
Copy link to clipboard
Copied
@Bedazzled532, if I understand Joel's technique correctly, the idea is to enter the entire line into the dictionary so it doesn't check words, but entire lines. Is that what you have already done? Did it work?
Copy link to clipboard
Copied
It was supposed to work that way. I am actually halfway between shamefaced and flabbergasted, over here, because my trick actually doesn't work in InDesign. If you swap the order of two words, the spellcheck doesn't catch it. When I export the word list, the dictionary is segmenting on spaces, not on lines. So it's not treating the whole first line, the bismillah, as one "word". I can go and edit the raw text file where InDesign stores the custom word list and it still treats each word on each line as a separate dictionary term.
The flabbergasted half of my brain is agog because I've used this trick before, this trick of defining whole phrases as single words. I am guessing that maybe I did it in Framemaker instead of InDesign? Maybe I did it in Hunspell, or maybe I'm thinking of a Trados termbase? I'm going to dig through my archives, see if I can't find the notes I assume I must have kept on it.
Copy link to clipboard
Copied
Oh well @Joel Cherney, it was a terrific idea to try anyway.
Copy link to clipboard
Copied
Hi @Bedazzled532, like Joel I am wondering what you are exactly trying to achieve. If you are trying to catch errors typed quotes from the Quran, would it be better to have a script just enter the text from the master data file. The user choose the chapter and verse and the script would look it up and insert it at the insertion point. That should be feasible. Otherwise you are talking about a quite sophisticated system that will require considerable development I think.
- Mark
Copy link to clipboard
Copied
Thanks m1b
I understand that what I want is complicated but I am just a beginner in scripting so I wanted to do just comparision line by line. I have to struggle even writing this simple script. Just in a learning phase.
Thanks
Copy link to clipboard
Copied
How about exporting Story(ies) as plain text or RTF, then sorting and comparing in WORD?
Not fully automated but if you won't have too many errors - it should be quick?
Copy link to clipboard
Copied
I suggested that a ways back, and for a one-shot, labor-intensive effort it still seems to be a viable approach. But I get the idea the OP needs this on a more continuing basis, something a little more integrated into a writing and publishing workflow.
And I believe the goal, at pretty much any cost, is *zero* errors.
Copy link to clipboard
Copied
I am trying this following script but for some reason it is not working.
The logic is to read one line from the original txt file (database), then read one line from the text frame,
compare the lines, if match then well and good, if does not match apply a char style of color red to that line.
I am not very good at writing scripts but this is what i have come up with. Any help would be appreciated.
Thanks.
//Read from file
file = File("d:/readid.txt");
file.open("r");
var content = file.read().split("\n");
for (var i = 0; i < content.length ; i++)
{
var orig = content[i];
//alert(content.length);
//Read from text frame
app.findGrepPreferences=app.changeGrepPreferences=null;
app.findGrepPreferences.findWhat=".+";
p = app.activeDocument.findGrep();
//alert(p.length);
for (var i = p.length-1; i >= 0; i--)
{
var newln = p[i].lines[0].contents;
if(orig === newln){
alert("same");
}
else{
alert("not same");
break;
}
}
}
Copy link to clipboard
Copied
@Bedazzled532, you've made a terrific start! Next you will need to get the story you want to check, so in my example below I've just got the story that the you have selected. I compare it paragraph-by-paragraph with the masterContentFile, and apply a characterStyle "Bad" if it doesn't match.
function main() {
var masterContentFile = File("d:/readid.txt");
var badCharacterStyleName = 'Bad';
if (!masterContentFile.exists) {
alert('Could not find master content file "' + masterContentFile + '".');
return;
}
var doc = app.activeDocument,
badCharacterStyle = doc.characterStyles.itemByName(badCharacterStyleName);
if (!badCharacterStyle.isValid) {
alert('Could not find character style "' + badCharacterStyleName + '".');
return;
}
if (
doc.selection[0] == undefined
|| !doc.selection[0].hasOwnProperty('parentStory')
) {
alert('Please put cursor in the story you want to check and try again.');
return;
}
masterContentFile.open('r')
var masterContent = masterContentFile.read().split("\n"),
userParagraphs = doc.selection[0].parentStory.paragraphs,
userContent = userParagraphs.everyItem().contents,
leadingTrailingSpace = /(^\s|\s$)/g,
contentCount = Math.min(userContent.length, masterContent.length),
differenceCount = 0;
for (var i = 0; i < contentCount; i++) {
var m = masterContent[i].replace(leadingTrailingSpace, ''),
u = userContent[i].replace(leadingTrailingSpace, '');
if (u != m) {
// $.writeln(' m = ' + m);
// $.writeln(' u = ' + u);
userParagraphs[i].applyCharacterStyle(badCharacterStyle);
differenceCount++
}
}
alert('Compared with master content, ' + differenceCount + ' different paragraphs were found.');
};
app.doScript(main, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, 'Check Story Against Master Content');
If you have trouble, try uncommenting the two writeln statements in the loop and look at the script output in the console.
I have wrapped the whole thing in a function main and called it via app.doScript. This means that there is only one neat Undo if you want to go back to before the script changed the styles.
Also, for your info, you can use the Code button in this forum to paste in script code. It looks much better and doesn't get scrambled. The code button looks like < / >
- Mark
Copy link to clipboard
Copied
@m1b Wow. Thank a ton for your efforts m1b
I created a 'Bad' char style. Script is running without errors but unfortunately it is nnot able to compare, I guess.
In my master database I entered two lines:
This is line 1
This is line 2
In my text frame in InDesign, I copy pasted the same line, for testing.
When I run the script, it is giving the message "Compasred with master content, 2 different paragraphs were found".
I dont know why this is happening.
What needs to be done now ? Is it the new line char or something else ?
Regards
Copy link to clipboard
Copied
@m1b Thanks so much m1b, it works. It was my mistake.
In the database, matter was in lower case and in text frame, it was in Upper and lower case.
I changed the case and it worked. Thanks a lot.
I will try to implement this login in Quran comparing.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now