Skip to main content
John Hawkinson
Inspiring
April 6, 2010
Question

Iterating over all text?

  • April 6, 2010
  • 1 reply
  • 1344 views

I'd like to iterate over all the text in a document (inside groups, tables, etc., etc.) and not miss any bizarre corner cases.

I thought I had seen a script from Marc Autret that addressed this, but I couldn't find it (instead, I found the [JS][CS3] Getting Page Number but I thread, which basically goes the other way).

I recently discovered that the method I had been using misses text inside tables. And the version I wrote  a while ago initially missed items inside groups. So I'm wondering if somone has a tried-and-true function that does this kind of thing.

Hre's what I have -- this works fine without tables. Looks for "@@" in any textbox in the document:

var i,j;
  for (i=0; i<doc.pages.length; i++) {
       var p = doc.pages;
       for (j=0; j<p.masterPageItems.length; j++)
            check_at("Master on p."+p.name, p.masterPageItems);
       for (j=0; j<p.pageItems.length; j++)
            check_at("On p."+p.name, p.pageItems);
  }

function check_at(name, pi) {
      if (debug) $.writeln(pi.constructor.name+" on "+name);
      if ('contents' in pi &&
             pi.contents.match("@@")) {
                      var i = pi.contents.indexOf("@@");
                      var s = Math.max(i-23,0);
                      var e = Math.min(i+23,s+37);
                      lines.push(name+":  "
                      +pi.contents.substring(s,e).replace(/\r/g,"\\n"));
            }
     if ('pageItems' in pi) // recurse into groups
            for (var k=0; k<pi.pageItems.length; k++)
              check_at(name+"", pi.pageItems);
}

but this fails on tables, because a TextFrame's contents property does not return the contents of a table.

(I also realized today that the group handling could be ignored if I just used "allPageItems" instead of "pageItems").

Anyhow, I guess I could also iterate over pi.tables and for each one, check .contents.join("\n").match("@@"). Since the contents of a table are an array, that would be joining all the cells together into one string and searching that string.

But I'm worried this is insufficiently robust? And it certainly is ugly.

Any good experience on this sort of thing? Thanks.

This topic has been closed for replies.

1 reply

Jongware
Community Expert
Community Expert
April 6, 2010

If you really want to catch all and every text in your document, you don't have to check all TextFrames. The basic text object is a "Story", so it's sufficient to loop over all stories. This will also catch stuff inside anchored frames, and even on the pasteboard. To differentiate, you'll need something like Marc Autret's routine, checking the parent of the story (it'll be something like Page, Spread, Document, or Character -- for an anchored object --, but do check as this is from memory).

Every single story can contain one or more tables, and these can be accessed immediately, as you found out. But as soon as you have a handle on a table, you can check its Cells array, which is a linear array containing each unique *cell* (including its contents).

This quick sample loops over all stories in your document -- whether anchored or on the pasteboard or elsewhere --, and all tables inside those, pasting together their contents. (And a notable exception is "Footnotes" -- but these are quite similar to tables, except you can have a table inside a footnote but not the other way around. Tables inside footnotes are *not* caught by a Story's tables array.)

string = '';

for (st=0; st<app.activeDocument.stories.length; st++)

{

     s = app.activeDocument.stories[st];

     string += "Story: "+s.contents+"\r";

     for (a=0; a<s.tables.length; a++)

     {

          for (b=0; b<s.tables.cells.length; b++)

               string += "Cell: "+s.tables.cells.contents+"\r";

     }

}

alert (string);

John Hawkinson
Inspiring
April 6, 2010

Yeah, I realize I can iterate over each Story. I'd prefer to know the PageItem that matches, and to be able to report it, potentially select it or focus on it, and to report back (the lines array in my original example) one hit per PageItem, rather than one hit per Story. I imagine wanting to report the position

of the enclosing PageItem, etc.

I guess I'd really like to have some methods that are a bit more type-agnostic, that don't rely on knowing that a Story has both contents as well as Tables that themselves have contents or cells that also have contents.

Maybe to recurse over all properties of the object and see if they have a contents sub-property, and if so check that. Though if I just let that run, it would return both the contents of the table (an array) as well as the contents of the cells of the table (strings). I suspect it would also be slow.

Maybe I should reevaluate my priorities, though, and accept Story as a better index to this stuff.

Jongware
Community Expert
Community Expert
April 6, 2010

Yes, you are correct: to immediately be able to select the frame, you could do a run-by per frame. It's possible to get the actual frame in which some threaded text is displayed, but that doesn't seem to be necessary. (And you'd happily skip overset text as well -- since this has *no* frame.)

So checking the tables inside frames ought to work. A warning ;-) Text in a table that threads into another frame is remarkably reluctant to return its actual  'parent frame' -- advanced scripters than me have discussed this before, on this very forum.

[looping over "everything"] .. it would return both the contents of the table (an array) as well as the contents of the cells of the table (strings) ..

I don't think there is a special need to loop over 'everything'. I'd have to browse back to your original post (which I can't, courtesy of Jive -- "Thou Shalt Reply Only To The Most Recent Post"), but in essence *every* text on your page has to be in at least one text frame per page. No text frame -> no text. And all tables, in turn, ought to be contained inside the text in that frame. Given a table, you don't have to loop over it and then its "children" objects (cells), you can collect $100, then directly inspect the Cells of that table.

If you need to select the penultimate, actual page item that may contain your text (the one that's right smack bang placed on your page, not nested-into-a-table-into-a-footnote-into-an-anchored-object), one way to do so would be to:

1. loop over all text frames on a certain page

2. using a function, inspect if it, or its tables, anchored objects, etc. contain your text -- this function may use some recursion to step inside objects-in-objects

3. select the frame from #1 if so.