Skip to main content
Inspiring
June 13, 2013
Question

Finding text with RegEx

  • June 13, 2013
  • 2 replies
  • 9758 views

I want to create an EScript which converts temporary citations inserted from EndNote by the final citations according to the desired standard. I had done such utilities for FM 7 and 8 (see daube.ch/docu/fmaker41.html). The first step is to collect the temporary citations form the document and write them to a new document which then will be exported as RTF (to be handled by EndNote).

Due to the limitation of Wildcard-search i use RegEx.

tempCit = GetTempCitation ("Hello fans [[Dante, #712]] and another one [[DuçanÌsídõrâ, #312]].");
alert ("tempCit = " + tempCit); 

function GetTempCitation (pgfText) {
     var regex = /(\[\[[^\]]+\]\])/;
     var tempCit = "$1";
     if (pgfText.search(regex) !== -1) {
          return pgfText.replace (regex, tempCit);
     } else {
     return null;
     }
}

Due to lack of documentation (and lack of knowledge) i have modelled this after Rick's "Using a regular expression to convert an image name

to a path":

1) The test script does not return the first occurrence of a temp. citation but the full string. Where is my error?

2) How can i make use of the property (?) rightContext to get the other occurances?

Thank You for your help to a newcomer

Klaus

This topic has been closed for replies.

2 replies

frameexpert
Community Expert
Community Expert
June 13, 2013

Hi Klaus,

Thanks for posting here. With JavaScript regular expressions, you have to use the "g" flag to get it to find more than the first occurrence (g = global).

#target framemaker

var pgfText = "Hello fans [[Dante, #712]] and another one [[DuçanÌsídõrâ, #312]].";

var citations = getCitations (pgfText);

alert(citations);

function getCitations (pgfText) {

    // Regular expression to isolate the citations.

    var regex = /(\[\[[^\]]+\]\])/g;

    // Array to store the citations.

    var citations = [], result;

    // Execute the regular expression.

    while (result = regex.exec (pgfText)) {

        // Push the result onto the array.

        citations.push (result[1]);

    }

    // Return the array

    return citations;

}

You can see where I have added the g flag at the end of the regular expression. Then I execute the regular expression in a loop; the loop will continue as long as there are matches in the string. For each match, I push the string into the array. When all strings are found, I return the array from the function.

A couple of notes: I like to use the #target framemaker instruction at the top of my scripts to ensure that they run with the FrameMaker object model instead of the default ExtendScript Toolkit. (In this case, it doesn't matter, because this is all native JavaScript code and not dependent on FrameMaker.)  Also, you should always declare your variables with the var keyword.

Please let me know if there are any questions or comments.

-- Rick

www.frameexpert.com
frameexpert
Community Expert
Community Expert
June 13, 2013

Actually, now that I see Jang's method, it is simpler for this case. If your regular expression has capturing groups, you have to use my method. For example, let's say that you wanted to capture the citations without the brackets. Then, you would use this:

var pgfText = "Hello fans [[Dante, #712]] and another one [[DuçanÌsídõrâ, #312]].";

var citations = getCitations (pgfText);

alert(citations);

function getCitations (pgfText) {

    // Regular expression to isolate the citations.

    var regex = /\[\[([^\]]+)\]\]/g;

    // Array to store the citations.

    var citations = [], result;

   

    // Execute the regular expression.

    while (result = regex.exec (pgfText)) {

        // Push the result onto the array.

        citations.push (result[1]);

    }

    // Return the array

    return citations;

}

Notice that I moved the parenthesis to exclude the enclosing square brackets.

Also note that if you don't need to capture any subparts of the string, you don't need the parenthesis at all in your regular expression.

Rick

www.frameexpert.com
4everJang
Legend
June 13, 2013

Hello Klaus,

By coincidence I am working on a similar problem today, using regular expressions in ExtendScript. My problem cannot be solved directly, as the regular expression engine in ExtendScript has a serious flaw. The usage of the "\1", "\2" etc strings to use the matched substrings does NOT work, so having a replacement string use some of the stuff that was matched has to be done differently. Bummer. The problem has been reported earlier but is located in the ExtendScript Toolkit, not in FrameMaker, and I am not sure how quick the ESTK development team at Adobe will pick this up.

I have created a workaround, which may also help you.

Instead of using the search method with a regular expression, you can use the match method. This either returns null (in which case there was no match) or an array of matched strings (even if it is only 1 string long). The elements in that array can then easily be replaced by the placeholder you are intending to put in. The trick in finding all matches to a single regular expression is to add the indicator "g" for global search. Try the followind and see if that is what you wanted to have.

function GetTempCitation (pgfText) {

    var regex = /(\[\[[^\]]+\]\])/g;

          return pgfText.match ( regex );

}

Note that the alert( ) function shows the full array contents, separated by a comma. If there were no matches, the returned value is null. So if you want to access the separate strings, you will have to loop through the resulting string array, after first testing for a null value.

Good luck

Jang

frameexpert
Community Expert
Community Expert
June 13, 2013

Hi Jang,

Can you send me an example that shows this bug? I would like to verify it myself. Thanks.

Rick

www.frameexpert.com
4everJang
Legend
June 13, 2013

Hi Rick,

Anything that uses the "\1" etc operators in the replace method does not give anything but whitespaces where the match results should be. I have in fact tried to reproduce the example from Adobe's Javascript Tools Guide that was written for the Creative Suite 5. Page 26 shows the following:

In a replace operation, you can use the captured regions of a match in the replacement expression by using the placeholders \1 through \9, where \1 refers to the first captured region, \2 to the second, and so on.

For example, if the search string is Fred\([1-9]\)XXX and the replace string is Sam\1YYY, when applied to Fred2XXX the search generates Sam2YYY.

Well, it doesn't. Look for extendscript regular expressions in the general Adobe forum and you will find at least two posts mentioning this as a bug in ESTK.

Ciao

Jang