Indexing by own list

Report · Feb 10, 2025

Hi.

In the first step I try to load a text-based list to create an index with certain topics. The loading is not the problem:

var myDoc = app.activeDocument;

app.doScript(main, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, "Funktionsprozess");

function main() {
	var myList = File.openDialog ("Indexliste laden"); 
	if (!myList) exit(); 

			myList.open ('r', undefined, undefined); 
			var theText = myList.read();//+'\n'; 
			theText = theText.replace(/ +\n/g, '\n').replace(/\n+/g, '\n'); 
			var words = theText //.replace(/\s+/g, '\n');  
	
			var words = theText.split("\n"); 
			listLength = words.length;

			myList.close(); 

	function makeMyList() { 
		app.documents.everyItem().indexes.everyItem().topics.everyItem().remove()

		newIndex = myDoc.indexes.add()
	
		for (var i = 0; i<listLength; i++){  
			myWord = words[i];			
			if (myWord != "") {

				newTopic = myDoc.indexes[0].topics.add (myWord);
				myDoc.indexes[0].topics.add(myWord);


				myDoc.indexes[0].update(); 
			}
		} 
	}
	makeMyList();
		
}

I had to change the script because making a mistake (wanted to use a string instead an array for the enties)🙂Now it works but I also need the page numbers. This is how it looks like:

But I´m afraid, the imported topics needs to be at the references (Verweise) instead of the theme (Thema) to get the page numbers, isn´t it?

Report · Feb 11, 2025

You next step would be to look for those entries in all documents and create page references at each instance.

There are various scripts around that do that. One example is

https://creativepro.com/files/kahrel/indesign/index_from_wordlist.html

Report · Feb 11, 2025

thanks Peter. Now I try to understand your script 🙂

First I want to take advantage of it by completing mine.

Report · Feb 12, 2025

@_AWID_

In short - you need to do your own search, then add each found instance to the index using pageRefetences.add().

https://www.indesignjs.de/extendscriptAPI/indesign-latest/#PageReferences.html#d1e127099__d1e127148

Report · Feb 21, 2025

Hi Peter.

I analyzed your script index_topic_list.jsx, but I didn't really get any wiser. 🙂

It's quite complex (to me) and with lots of nested functions. The crux of the matter - in my opinion - is the index_documents function with the both loops. Unfortunately, I don't understand what's happening. Here's an example:

duplicates[search_item] ? duplicates[search_item].push (word_list[j]) : duplicates[search_item] = [word_list[j]];

And in the function index_from_list, where most other functions are called, there is also something I don't understand: __count__, what is that good for?
I would need some very simple lines of code to get the page numbers of all imported topics and also to create index entries for them. A very simple script, without any safeguards for all possible errors. Just to understand it first.

The theoretical steps are, that´s clear:

- open the list (txt document)

- import all terms from the list

- search for the terms in each document of the book

- determine the page number of the words found

- create a new index based on the terms/their page numbers found in the book documents

Here's something I tried that doesn't work either. 😞

The script is not entirely made by myself, I used snippets. And so I do not get everything that is happening there :). There are comments on what I understood and what I didn't.

app.doScript(main, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, "Funktionsprozess");

//////////// main-function, contains all other functions
function main() {
    
    var myDoc = app.activeDocument;
    var allOpenDocs = app.documents.everyItem().getElements();
    var allOpenDocsLength = app.documents.length;
    importList();
    
    //////////// importing the list from an external txt-document
    function importList(){   
        //////////// Generation of the dialog to open a list as a source for the index topics
        var myList = File.openDialog ("Indexliste laden"); 
        if (!myList) exit(); 

        myList.open ('r', undefined, undefined); 
        var theText = myList.read();//+'\n'; 
        // removing spaces at the end of the paragraph and removing multiple spaces 
        theText = theText.replace(/^$/g, ''); 
        theText = theText.replace(/ +\n/g, '\n'); 
        theText = theText.replace(/\n+/g, '\n'); // unfortunately does not do the job

        words = theText.split("\n"); 
        listLength = words.length;
        myList.close(); 
        thatsMyDoc();
    }
    //////////// Iterate through all documents
    function thatsMyDoc(){
        for(d = 0; d < allOpenDocsLength; d++){
            thisDoc = allOpenDocs[d];
            // Creating a variable whose content is an object: a function with two parameters
            var indexEntries = findTextInDocument(thisDoc, words);
            for (var i = 0; i < indexEntries.length; i++) {
                createIndexEntry(thisDoc, indexEntries[i].term, indexEntries[i].page);
            }
        }
    }

    //Search for terms in the documents
    function findTextInDocument(thisDoc, words) {
        app.findTextPreferences = app.changeTextPreferences = NothingEnum.nothing;
        // setting indexEntries as an empty Array
        var indexEntries = [];

            // iterating through each term (list element)
        for (var i = 0; i < listLength; i++) {
            // >>>>>> PROBLEM: if the list isn´t clean (no empty paragraphs) 
            // an error will be displayed here ("Object contains no text to find/replace.")
            app.findTextPreferences.findWhat = words[i];

            //Search in each document for each term
            var foundItems = thisDoc.findText();
            
            //only continue if something has been found
            if (foundItems.length < 0 ) {continue;}
                for (var j = 0; j < foundItems.length; j++) {
                   
                    // Do not continue working with the entry if it is not on the page 
                    if (foundItems[j].parentTextFrames[0].parentPage == null) {continue;}
                       var checkWord = foundItems[j].select();
                        var page = foundItems[j].parentTextFrames[0].parentPage.name; 
                        // QUESTION: what happens here / What does term: words[i], page: page mean?        
                        indexEntries.push({ term: words[i], page: page });
                }
        }
        app.findTextPreferences = app.changeTextPreferences = NothingEnum.nothing;
        return indexEntries;
    }

    // Create index entries
    function createIndexEntry(thisDoc, term, page) {
        var index = thisDoc.indexes.length > 0 ? thisDoc.indexes[0] : thisDoc.indexes.add();
        var topic = index.topics.itemByName(term);

        if (!topic.isValid) {
            topic = index.topics.add(term);
    }   
        // >>>>> PROBLEM: Invalid value for parameter "source" of method "add". Expected text, but received page.
        topic.pageReferences.add(thisDoc.pages.itemByName(page), PageReferenceType.currentPage);
    }

    alert("Index erfolgreich erstellt!");
}

Report · Feb 21, 2025

@_AWID_

You need to combine findTextInDocument with createIndexEntry.

After you find texts in InDesign - just use pageReferences.add().

Please check again link I've provided - there is info what params you should supply to add() - reference to text and PageReferenceType.currentPage.

Report · Feb 11, 2025

@_AWID_

Your second line here:

				newTopic = myDoc.indexes[0].topics.add (myWord);
				myDoc.indexes[0].topics.add(myWord);

is a duplicate of the one above - so you can remove it.

Report · Feb 11, 2025

Oh yes, you´re right 🙂 Thanks!

I wanted to shorten it, and forgot to remove....

Report · Feb 21, 2025

> duplicates: forget about this for now.

> __count__: this returns the number of items in an object. It's comparable to an array's .length. Forget about it for now.

> determine the page number of the words found

This is where you go wrong: you don't determine a term's page number, instead you just create a page reference on the page.

The basic steps are the following:

1. Open your documents

2. Open the word list and get each item

3. Take an item, say 'amber', then in each document

a. create a topic 'amber'

b. look for all occurrences of 'amber' in a document and

c. at each occurrence of 'amber' create a PageReference

Your earlier script got as far as 3a: it created topics. You should then look for instances of that topic and create page references. That can be done by something like the following code -- it's in the script, but pared down to only the necessary steps:

topicName = ...; // reference to a topic name, e.g. 'amber'
app.findTextPreferences = null;
app.findTextPreferences.findWhat = topicName;
found = thisDoc.findText();
if (found.length > 0) {
  topic = thisDoc.indexes[0].topics.add (topicName);
  for (i = found.length-1; i >= 0; i--) {
    topic.pageReferences.add (found[i], PageReferenceType.CURRENT_PAGE);
  }
}

Report · Feb 26, 2025

Now I have a working script! 🙂

var myDoc = app.activeDocument;
var allOpenDocs = app.documents.everyItem().getElements();
var allOpenDocsLength = app.documents.length;
var allTopics = [];

app.doScript(scriptTimer, ScriptLanguage.JAVASCRIPT, undefined, UndoModes.ENTIRE_SCRIPT, "Funktionsprozess");

function scriptTimer(){
    var timeDiff = {
    setStartTime:function (){d = new Date(); time  = d.getTime();},
    getDiff:function (){d = new Date(); t = d.getTime() - time; time = d.getTime(); return t;}
};

// Start timer
timeDiff.setStartTime();
 
// Start Main Funktion
main();
 
// get result
alert("Dauer der Ausführung: " + timeDiff.getDiff() / 1000 + " Sekunden", "IndiSnip /// Scipt execution time");
    }

function main() {
	var myList = File.openDialog ("Indexliste laden"); 
	if (!myList) exit(); 

			myList.open ('r', undefined, undefined); 
			var theText = myList.read();//+'\n'; 

            theText = theText.replace(/^$/g, ''); 
            theText = theText.replace(/ +\n/g, '\n'); 
            theText = theText.replace(/\n+/g, '\n'); 
	
			var words = theText.split("\n"); 
			listLength = words.length;
			myList.close(); 
            thatsMyDoc();
		
    function thatsMyDoc(){
        for(d = 0; d < allOpenDocsLength; d++){
            thisDoc = allOpenDocs[d];
            showActiveDoc(); 
            if(thisDoc.indexes > 0) {thisDoc.indexes.everyItem().topics.everyItem().remove();}
            newIndex = thisDoc.indexes.add();
			serch4Topics();
            thisDoc.indexes[0].update(); 
        }

    }

    // snippet by Adobe Community-Member Laubender
    function showActiveDoc(){
        var dlog = new Window("palette");
        dlog.size = [400,50];
        dlog.add("statictext", undefined , "Suche nach Schlagworten im Dokument: "+ thisDoc.name );
        dlog.show();
        // Have a nap:
        $.sleep(50);
        // Closing the dialog:
        dlog.close();
    }

    function serch4Topics(){
        for(var j = 0; j<listLength; j++){
            topicName = words[j]; // iterating through the list, topics one by one
            app.findTextPreferences = null;
            app.findTextPreferences.findWhat = topicName;
            found = thisDoc.findText();
            if (found.length > 0) {
                topic = thisDoc.indexes[0].topics.add (topicName);
                for (i = found.length-1; i >= 0; i--) {
                    topic.pageReferences.add (found[i], PageReferenceType.currentPage);
                }
            }
        }
    }
}

To run it over nearby 300 Pages, based on a list with over 200 terms it takes round about 20 Minutes. But that ist not a problem.

I would like to eliminate duplicates.

@Peter Kahrel In your script index_topic_list.jsx there are some lines like (I do not understand):

...
duplicates[search_item] 
	? duplicates[search_item].push (word_list[j]) 
	: duplicates[search_item] = [word_list[j]];
...

...and other references to "duplicates" in other functions.

Is there another, more simple way to avoid duplicates?

Report · Feb 27, 2025

> Is there another, more simple way to avoid duplicates?

The script deals with (tries to, anyway) potentially ambiguous entries, especially in a name index. If the list contains two or more entries for Smith, as in

Smith, John

Smith, James

It looks for 'Smith' in the text and notices that there are more Smiths. So these are special cases.

If you're worried about a full entry occurring twice or more, e.g. 'border collie', then don't worry. There's no need to check for duplicates because InDesign checks for them internally (one of the very few clever features of InDesign's index).

Report · Feb 27, 2025

It looks for 'Smith' in the text and notices that there are more Smiths. So these are special cases.

If you're worried about a full entry occurring twice or more, e.g. 'border collie', then don't worry. There's no need to check for duplicates because InDesign checks for them internally (one of the very few clever features of InDesign's index).

By @Peter Kahrel

That´s nice! 🙂

But: mea culpa, I explained it wrong. The list of sources is hand-made and none of the terms are duplicated. The duplicates I get are the page numbers. Some of the terms appear more than once on a page. So in the index list I`ll generate, I need the page reference for each term only once...

Report · Feb 27, 2025

Duplicate page numbers are not a problem, each page number is printed only once. In fact, you want to keep those 'duplicates' because if an item occurs twice on a page, after some changes in the text they may be on different pages.

You see all instances of 'duplicate' pages in the Index panel, but when you generate the index they're filtered out.

Report · Feb 27, 2025

Perfect, thanks! So I´m ready for the next step... 🙂

Report · Feb 27, 2025

Only one last question (we´ll see). If I like to avoid to search/find only parts of a word, what I have to do in the funktion serch4Topics()? For example, the (german) word "Bambus" is in the list, but I don't want "Bambustisch" or "Bambusstock" to be found.

Report · Feb 27, 2025

Ally you need to do is enable the whole-words-only preference.

Report · Feb 27, 2025

Sorry, but I didn´t get it 😞

Searched around the net, but didn't found anything. Even not in the ExtendScript API. Which preferences? findTextPreferences ?

Report · Feb 27, 2025

@_AWID_

WholeWord in:

https://www.indesignjs.de/extendscriptAPI/indesign-latest/#FindChangeTextOption.html

Report · Feb 27, 2025

cool, thanks. Now it´s safe. 🙂

Report · Feb 27, 2025

Sorry, it's not a preference, but an option. You can use this

https://www.indesignjs.de/extendscriptAPI/indesign-latest/#about.html

site to browse InDesign's object model. Look for findchangetextoptions and you'll hit upon this, which tells you how to set it:

Report · Feb 27, 2025

yes, Robert gave me the hint 🙂

Thanks!

(Wow, the ExtendScriptAPI contains a huge amount of information. One has to know where to look for the right ones... 😞 )

Report · Feb 27, 2025

@_AWID_

Use simple words to search.

Report · Feb 27, 2025

If there is an expression in the list that consists of two hyphenated words, will it still work with this setting (app.findChangeTextOptions.wholeWord = true)?

Report · Feb 27, 2025

You can try that in he interface: type abc-def in a frame, enter abc-def in the Find what field, enable whole word, and start the search. What happens in the interface is wha happens with the script.

Be aware that if the text uses a non-breaking hyphen, and you type a normal hyphen in the Find/Change window, you won't find it.

Report · Feb 26, 2025

Thank you Peter, Robert.

The difficulty for me is that I can follow the code for a while, but then I lose focus :). Unfortunately, it is not so easy for me to establish the logical connection between functions in more complex codes. I think that for experts like you, this is a piece of cake and can be written between two cigarettes. 🙂 But it takes me weeks to understand it, and even longer to implement it, in between during working hours .

For exemple:

if (Math.abs(p_ref.sourceText.index - ip)

or when one set as the value of a variable a funftion with parameters:

var indexEntries = findTextInDocument(thisDoc, words);

@Peter Kahrel

I know what "count" means, but the double underline confused me.

With your code snippet I will try to keep it simple and to solve the issue

@Robert at ID-Tasker

I will look for the references again, it might take some time