Copy link to clipboard
Copied
Hi,
I have a large medical document (around 350 pages) with thousands of words. This document is already proofed for errors and I want to use it to build a specialized Indesign dictionary to use it in ID and other text processing software. In addition I want to build a database with those terms for using in future works in that discipline. So, I need a script that gathers all the words from the document (all the text is completed treaded in a single frame), eliminates de repeated words, sort them alphabethicaly and puts the results in a text file. Every word should be in his own paragraph.
After that I will give the resulting text document another quick reading to look for inconsistences, other errors and typos, etc. In the end I will use the final output to build a specialized used ID dictionary and a database with those terms for using in the technical proofreading of this type of books.
I confess that I do not know much about scripting (I really have tried...) so I have this question: Is this doable? Is there someone out there that can make me this script?
Thank you in advance,
Maria
Here is a working Javascript to do the word gathering and sorting. There are straightforward ways to do this, but my script uses a couple of shortcuts that are possible with both Javascript (split arrays on a regular expression; remove duplicates by feeding the result into an object) and InDesign (using everyItem to quickly gather all possible text). So it may be kind of unclear what happens where
A problem (as you may have already have found out by yourself) is how to determine what a 'word' is
...Copy link to clipboard
Copied
Oh yes, doable, not particularly difficult.
Copy link to clipboard
Copied
Hi, Stephen,
Thank you very much for you opinion. I think that this is doable and not particulary difficult (for you). For me it is rather difficult because I don't get along with scripting (I have tried...). But anyway I solved my problem: replace all the spaces between words with a paragraph, export to text and used grep in Nopepad++ to sort the words and delete the duplicates. With another quick proofing I will get a wonderful indesign user dictionary for medical terms.
Now another question: do you know any script that I can use to convert all the tables to text in a document?
Maria
Copy link to clipboard
Copied
Finding the text in a table isn't hard ...
where do you want the table text to be put? in a new textFrame?
what do you need to do with the table text?
how do you want the table text to look?
what happens to the original table?
Making the table text look anything like the original table is, I think, hard. How would you do this "by hand"? The answer will tell us what the script needs to do, and what decisions a such a script would need to make.
Copy link to clipboard
Copied
Hi, Stephen,
I have already solved my problem with the tables and the project «ID Medical Dictionary» is almost ready.
I really want to thank you for your help to someone that simply is not able to cope with scripting… I have started working in books with PageMaker 1.0 and I really do not know how I survived without scripting all these years. I have to be thankful to Google and to this script community…
I am a specialist in MathType and if someday you have a problem with that contact me.
Maria
Copy link to clipboard
Copied
Here is a working Javascript to do the word gathering and sorting. There are straightforward ways to do this, but my script uses a couple of shortcuts that are possible with both Javascript (split arrays on a regular expression; remove duplicates by feeding the result into an object) and InDesign (using everyItem to quickly gather all possible text). So it may be kind of unclear what happens where
A problem (as you may have already have found out by yourself) is how to determine what a 'word' is. This script replaces common punctuation and digits with a space, and then only gathers what's left between the spaces. You are sure to find some weird "words" this way, but then again so does your manual way.
After processing, the script prompts for a Save File name and then opens it in your default plain text editor.
textList = app.activeDocument.stories.everyItem().texts.everyItem().contents.join('\r');
textList = textList.replace(/[.,:;!?()\/\d\[\]]+/g, ' ');
textList = textList.split(/\s+/);
tmpList = {};
for (i=0; i<textList.length; i++)
tmpList[textList] = true;
resultList = [];
i = 0;
for (j in tmpList)
resultList[i++] = j;
resultList.sort();defaultFile = new File (Folder.myDocuments+"/"+app.activeDocument.name.replace(/\.indd$/i, '')+".txt");
if (File.fs == "Windows")
writeFile = defaultFile.saveDlg( 'Save list', "Plain text file:*.txt;All files:*.*" );
else
writeFile = defaultFile.saveDlg( 'Save list');
if (writeFile != null)
{
if (writeFile.open("w"))
{
writeFile.encoding = "utf8";
writeFile.write (resultList.join("\r")+"\r");
writeFile.close();
writeFile.execute();
}
}
Copy link to clipboard
Copied
hi,
i thought of having a speedproblem using every item, so exported the stories ...
#target Indesign |
Array.prototype.unique = function() { | |
var o = {}, i, l = this.length, r = []; | |
for(i=0; i<l;i+=1) o[this] = this; | |
for(i in o) r.push(o); | |
return r; | |
}; |
var storyFiles = new Array();
var currDoc = app.activeDocument;
var docName = currDoc.name;
var currStories = currDoc.stories.everyItem().getElements();
l = currStories.length;
while(l--){
currStory = currStories
currStory.exportFile(ExportFormat.TEXT_TYPE, File('~/Desktop/' + docName + l + '.txt'));
storyFiles.push(File('~/Desktop/' + docName + l + '.txt'))
} |
var masterStory = '';
l= storyFiles.length;
while(l--){
currExport = storyFiles | |
currExport.open('r'); | |
masterStory = masterStory + currExport.read(); | |
currExport.close(); | |
currExport.remove(); | |
} |
var finalCut =masterStory.replace(/[?,.!\n\r]/g,' ').split(' ').unique().sort().join('\n');
destFile = File('~/Desktop/' + docName.replace(/indd/, 'txt'));
write_file(destFile, finalCut);
destFile.execute();
function write_file ( _file, _data )
{
_file.open( 'w' ); | ||
_file.encoding = 'UTF-8'; | ||
_file.write( _data ); | ||
_file.close(); |
}
Copy link to clipboard
Copied
Hi, Jongware.
First of all, thank you very much for your script. As usual “Jongware ROCKS”.
I apologize for being all this days without answering to your post but I took a few days for resting and I forbid myself of touching a computer, read emails, etc....
Maria