Skip to main content
Inspiring
April 2, 2016
Question

Regular expressions unpredictable and weird

  • April 2, 2016
  • 3 replies
  • 1227 views

I am really struggling with running regular expressions in Adobe Scripts. They don't seem to have work as they should. I am constantly having to break down complex expressions to simple ones to find things that appear to be bugs in the syntax.

For example:

var pattern = /(A(.+)C)*/g;

var string = "ABBC";

$.writeln(pattern.exec(string));

I should get the result:

"ABBC",

"ABBC",

"BB"

but instead I get:

undefined,

undefined,

"BBC"

What is going on here?

This topic has been closed for replies.

3 replies

Jongware
Community Expert
Community Expert
April 4, 2016

The star in your pattern is causing the problems here.

Remember, '*' means zero or more. Now usually GREP is greedy, and a '*' should do 'no harm' when used this way. (Indeed, regex101.com does the correct thing.) But it seems something is broken in InDesign.

That said .. I actually see no reason to have that * in your expression! Since it's applied to the entire group that you are searching for, it essentially means zero or more of the entire group. That may be (wrongly, though) why the first items it finds are empty.

When I remove the * at the end I get what you expect:

ABBC  (entire match)

ABBC (group #1)

BB (group #2)

Marc Autret
Legend
April 4, 2016

Hi McShaman,

Don't expect ExtendScript to behave as regular JavaScript with regular expressions. There are several bugs in that field and, in particular, in CS6 and later when it comes to greedy quantifiers like +, * or {m,n} used in conjunction with sub-patterns. More details here: Indiscripts :: InDesign Scripting Forum Roundup #7

@+

Marc

McShamanAuthor
Inspiring
April 4, 2016

Thanks... Yeah I thought so. Been finding a lot of inconsistencies between ExtendScript regex and how my browser would behave.

Whey you say ExtendScript has problems... Does that mean if I run the script from within the Application (e.g. InDesign) it should work?

pixxxelschubser
Community Expert
Community Expert
April 5, 2016

McShaman‌,

please read every post again - attentively.

And then try this:

var str = "ABBC";

var iOarr = /\*\$?\//;

var x = null;

var pattern = [ "(A(.*)C)*", "(A(B+)C)*", "(A([^C]+)C)*", "(A(.+)C)*$", "(A(.*)C)" ];

for (i=pattern.length; i>0; i--) {

    var pat = pattern.shift();

    var pat = new RegExp (pat);

  

    if(pat.toString().match (iOarr) == null) {

        x = "without asterisk";

        } else {

        x = "with asterisk";

        };

$.writeln(x+", " +pat + ", " + pat.exec(str));

}

pixxxelschubser
Community Expert
Community Expert
April 3, 2016

It seems to be wrong syntax.

Try one of these instead:

var pattern = /(A(B+)C)*/g;  // or

var pattern = /(A([^C]+)C)*/g;   // or

var pattern = /(A(.+)C)*$/g;

The reason is:

"C" will be found in your own regex always with (.+) and is already part of this.

Have fun

Obi-wan Kenobi
Legend
April 3, 2016

Why not?

var pattern = /(A(.+)C)/g

even if I don't really understand what you want to do!