Using GREP to mark everything that isn't a specific word

Report · Aug 08, 2019

I'd like to use a GREP style to mark everything, except certain words. So for example:

This is ignore an example ignore text

should be marked like so:

This is ignore an example ignore text

Normally, I'd just have two rules and mark the words I don't want included, but the style I want to apply hides the selected words by setting their size to 0.1 - which can't be reset by another style (the new style would force its size upon it, instead of resetting it).

This is as far as I have come, and this already took me a day to figure out:

.*(?<=[^ignore]).*

It works if the word to ignore is not surrounded by anything. So if the text is just "ignore" it will stay, everything else will be marked. Any help would be appreciated!

Report · Aug 09, 2019

You need to do this in two steps: first mark everything, then unmark the words to ignore. The character style to mark everything is applied to ^.+ and marks whole paragraphs. Then the character style to undo the marking is applied to \b(ignore|this|and|that)\b

P.

Report · Aug 09, 2019

Thanks for your input, Peter!

Unfortunately, as I tried to explain in my post, I can't do two steps. I can't create a style to undo the marking, since the marking style changes too many properties that are text specific.

Report · Aug 09, 2019

How about applying a condition?

P.

Report · Aug 09, 2019

> How about applying a condition?

You can't set a condition in a character style.

Report · Aug 09, 2019

Sorry, completely missed what you said about not being able to use a two-stage approach -- it's very clearly there!. The problem is that with grep you can look for characters that are not a particular letter, but you can't look for strings that are not a particular word. Your [^ignore] doesn't skip the word 'ignore', instead it ignores everything that is not 'i' and not 'g' and not 'n', etc. That how character classes ([. . .]) work. Your grep simply matches whole paragraphs. In plain English it says 'match zero or more characters up to the first character that isn't i, g, n, etc, then match zero or more characters.

You'll have to use a two-step approach, and if you can't use grep styles it'll have to be a script, something like this:

ignore = '\\b(ignore|this|and|that)\\b';
app.findGrepPreferences = app.changeGrepPreferences = app.findChangeGrepOptions = null;
app.findGrepPreferences.findWhat = ignore;
app.changeGrepPreferences.underline = true;
app.activeDocument.changeGrep();
app.findGrepPreferences = app.changeGrepPreferences = null;
app.findGrepPreferences.underline = false;
app.changeGrepPreferences.appliedCharacterStyle = app.activeDocument.characterStyles.item ('mark');
app.activeDocument.changeGrep();
app.findGrepPreferences = app.changeGrepPreferences = null;
app.findGrepPreferences.findWhat = ignore;
app.changeGrepPreferences.underline = false;
app.activeDocument.changeGrep();

In line 11 you use the name of your character style.

The script uses underline as a temporary marker, but if you use underline somewhere, use a different temporary marker, such as strikethrough, a colour, anything that's not used in the text.

P.

Report · Aug 09, 2019

Thanks again! Also for clarifying that my solution basically worked by pure chance (none of the words I tested begin with a character from the ignored word).

I have no experience with scripts. I'm using the GREP style for a merge. Can I set it up in a way that it will work automatically in merge previews and the final merge? Or do I have to trigger the script manually every time?

Report · Aug 09, 2019

You'd have to run the script every time.

P.

Report · Aug 10, 2019

Hm. Running the script manually every time defeats the point

Report · Aug 09, 2019

Can’t you just omit the word you’re trying to (ignore) from the first style? Seems backwards to mark an entire paragraph with a character style. Just update the paragraph style. I guess I don’t understand what your end goal is.

Report · Aug 10, 2019

I have different texts / paragraphs with different font sizes, styles, etc. I want to hide everything in them, except for specific words (one specific one, for starters). I hide stuff with a style that makes the text transparent and the font size 0.1. But there doesn't seem to be a way to revert that with a character style, without overriding the font size it should be.

So I'd have to create a paragraph style and a character style for every place I want to use this that has a different look. I'd like to avoid that to keep things manageable. Only thing I can think of is a GREP that hides everything but the specific word(s).

Report · Aug 10, 2019

You want to at least mark everything that is "not a word":

\W+

(where the '+' is in the hope that this is more efficient than a single \W, which would act upon each not-a-word character one at a time).

You also want to mark every entire word ...

\W+|\b\w+

– and now everything should be marked; all not-a-word characters OR all word characters. But I'm not done yet.

"After" the "\b" (word break) you are always at the start of a word. At that point you can insert a negative lookahead to exclude the words you are looking for:

\W+|\b(?!(ignore|me)\b)\w+

leading to the result:

You can list all of your to not be ignored inside the inner parentheses, separated with an OR bar |.

Report · Aug 10, 2019

A touch of the Master.

Sometimes you may find useful to add to this excellent regex above a Case-insensitive On modifier (?i), say, to pick up a word at the beginning of a sentence.

Report · Aug 10, 2019

Pure magic! Thanks so much, not only for the solution but also the clear explanation.

I was afraid it's just not possible. Understanding your solution will open up a lot of possibilities. Excellent!

Report · Aug 10, 2019

Nice one, Theunis!

Report · Aug 12, 2019

Okay, I do have a follow up on your solution. I played around with your GREP code a bit and realized my mental model is completely not matching what is going on.

How would you tackle the following situation?

This is ignore an example ignore me text

should be marked like so:

This is ignore an example ignore me text

"ignore me" is now one term where the space in between should not be marked... just writing (ignore|ignore\sme) obviously doesn't do the trick...

Using GREP to mark everything that isn't a specific word

This is ignore an example ignore text

This is ignore an example ignore text

1 Correct answer

This is ignore an example ignore me text

This is ignore an example ignore me text