InDesign GREP chart
Copy link to clipboard
Copied
I've recently uploaded a GREP list/cheat sheet/reference card to a couple of topics. It's one I created a few years ago and have been updating for my own uses, since I can't seem to find one that meets all my wishes — complete, accurate, organized, reasonably tidy-looking and without the points of murky example or understanding that so many of the references out there seem to have. (Okay, I'm fussy.)
After my upload yesterday I noticed an error, and some omissions, and one thing led to another, and so — very polished version of what I think a GREP reference should be is attached. I hope both newcomers and grizzled GREPpers find it useful.
PDF and IDML also available on my digital publishing reference site.
All due credit, both in general and as a resource for this update/polishing round, to @Peter Kahrel 's authoritative GREP in InDesign.
[Older version removed; updated file in later post.]
Copy link to clipboard
Copied
What about all those posix-es - like "[[:punct:]]"?
Copy link to clipboard
Copied
Do YOU see any room for them? 🙂
There are a few very advanced things missing from the list, partly from space and partly because they need more than a thumbnail reminder. Posix is pretty advanced, and they're all right there in the drop-down lists for those who know what they are and how to use them. Same for modifiers; things like multiline are pretty hairy to explain, and the options are clearly spelled out in the menu. (Also back-references, which are complicated AND deprecated... and I suspect there are more "unofficial" elements in there as well.)
But make a good argument and I'll squeeze them in.
ETA: I guess the case-sensitive and case modifiers are borderline. Hmm.
Copy link to clipboard
Copied
Myriad Pro Light Condensed and maybe 4x columns?
Copy link to clipboard
Copied
Oh, getting the info on there in a technical sense is no problem. But downloading a free magnifying glass with every copy isn't due until v20.3.
Copy link to clipboard
Copied
So ARE you happy now? 😛
(Attached file above, and website files, updated.)
Copy link to clipboard
Copied
Many thanks for this James.
Copy link to clipboard
Copied
@Marc Autret's version:
Is missing "\W" - and maybe more from your list - but there are few extra codes - like Text Variables and Arabic/Hebrew/Bidi.
Copy link to clipboard
Copied
I'm not sure that (or ID's feature) has been updated for... quite a while. It's a great chart, very complete and well-organized; I'd expect nothing less from MA.
But it's pretty overwhelming and contains lots that is of little use to most users — Arabic and Hebrew characters codes simply aren't something most users need, for example. Things like back references and variables are very advanced stuff; users who can wrangle them aren't likely to need a reference chart. It has a lot more information than mine, but at the cost of being three pages long.
Again, as my followup post outlines, I wasn't aiming for The Last GREP Chart The World Will Ever Need, just a compact, clear, accurate one useful to the vast middle ground of InDesign (not general/programming/Linux) users. Consider it in the light of the old shareware disclaimer, "This software is guaranteed to run perfectly on the developer's computer." — "This chart is guaranteed to meet all the expectations of its creator, because he couldn't find an existing one that did."
But all reasonable feedback always welcome; I'd hope to make this a useful asset for that middle ground of GREP users, even if they're not me. 🙂
Copy link to clipboard
Copied
Thanks for the feedback, suggestions and contributions (through multiple channels) so far.
To make it clear, I don't intend this to be the most complete reference to GREP — not by a long shot. For one thing, the whole feature/topic is so vast and murky that there's always-always one more obscure command, another useful combo string, another trick, another limitation, etc. And we know there are many-many reference sheets, sites, pages, forums and books... some of which are even specific to InDesign's slightly specialized implementation.
But in going from "had a little memorized knowledge" to a fairly confident GREP journeyman over the last year, I've found the GREP "support" world to be like many others, in that there are many levels of understanding (quite a few of which are way beyond ID's uses), endless references of one kind or another, and the usual mix of reliability, currency and accuracy.
So I did what I do for a living (more or less, broadly speaking, all that) and created my ideal of a GREP reference, which aimed to be complete (to a specific degree), accurate (all the commands tested and the boilerplate descriptions corrected and expanded, as necessary), compact (instead of pages and pages of stuff) and, I'd hope, tidily and attractively presented. My aim was a sheet that would be of use for users across the range from "just took the training wheels off" through fairly sophisticated experts... but NOT being a tutorial for absolute newcomers or including every esoteric command/option that takes master-level experience to use. A sort of middle range tool, useful to remind the experienced and help expand the newcomers's grasp of the options. (And, without any intent to be Anglocentric or exclusionary, focused on English/Western users. I find the "let's include everyone" model of fonts, references, character/glyphs etc. a bit overwhelming given that much of that content only applies to a small percentage of users... who are usually comfortable with finding more specific support that isn't cluttered with Western specifics. 🙂 )
I wanted to limit the content to some core set of info, to keep it from being overwhelming or (as are many such references, especially from the Linux/programming side of things) filled with cryptic descriptions and notes. I left out POSIX, for example, because while some have it in their toolbox, it's largely redundant against more fundamental GREP commands. I left out back references and variables as much for space as for their being very complex concepts to put to use in GREP strings.
in short, I was aiming at a complete (to a useful level), accurate (devoid of some of the repeated errors or misleading descriptions found out there), clear (substituting clear descriptions for code-monkey-speak) and usable (clearly showing string formatting, for example) ref that was not overwhelming in content or presentation.
Continuing feedback welcome. 🙂
Copy link to clipboard
Copied
Hi @James Gifford—NitroPress, thanks for posting this. I think it's great!
I'd make one observation: that the explanation for \K is surprisingly difficult to understand for me. A clearer description, in my opinion, would be "Reset Match" or something like that. The K is usually referred to as "keep out" but that doesn't help my mental model because what it actually does is discard the current match and start again. In fact "discard" might be a good word to use. Anyway, with that in mind, the explanation via the asterisk is quite opaque, at least to me.
Otherwise, thanks for sharing.
- Mark
Edit 2025-02-16: changed "keep" to "keep out" which I hadn't remembered fully (probably due to it not really being a good explanation for what it actually does! 😛 ).
Copy link to clipboard
Copied
Are you sure you're talking about "\K"?
https://www.oreilly.com/library/view/grep-in-indesign/9780596157173/ch14.html
https://creativepro.com/files/kahrel/indesign/grep_editor.html
3rd note in the Version History
I think you're referring to this post?
https://creativepro.com/topic/using-grep-to-apply-style-to-the-end-of-paragraph/#post-116138
Copy link to clipboard
Copied
Yes Robert, I'm talking about \K. The first two links you posted give wrong information. It is not a "lookbehind" of any kind. The third post was correct.
But here's a more straightforward explanation, from regexr.com (my highlight in magenta rectangle):
- Mark
Copy link to clipboard
Copied
So even @Peter Kahrel is wrong? Together with @James Gifford—NitroPress?
Maybe InDesign's implementation is different?
Copy link to clipboard
Copied
@Robert at ID-Tasker yes, yes and no. No big deal though.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
@Robert at ID-Tasker please don't make a huge deal out of this—it really isn't. I was only observing in James' cheat sheet that a better explanation would be helpful to understand \K.
By the way, the stackexchange link agrees with me 100%, and again uses the word "keep" which doesn't help the mental picture of what \K actually is:
> There is a special form of this construct, called \K
(available since Perl 5.10.0), which causes the regex engine to "keep" everything it had matched prior to the \K
and not include it in $&
.
Maybe you are being confused by the fact that (as per the same stackoverflow link)
This effectively provides non-experimental variable-length lookbehind of any length.
So, again—no big deal—but \K is not a lookbehind, despite that it effectively provides the same result. It is about the mental model and the ease of remembering. I noted that James' current wording was difficult to understand and it is up to him whether he agrees with me, or even cares.
And one last time, Robert... please... this is no big deal.
- Mark
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Copy link to clipboard
Copied
I don't know if you misunderstood anything because you haven't made any claims.
Copy link to clipboard
Copied
@m1b Mark -- I agree that the use of keep obscures rather than explains things. It's clearer to say something like "Find X if preceded by Y".
However, I don't see why you wouldn't want to call \K a lookbehind. Like the classic lookbehind, it finds things if they're preceded by a certain pattern. Lookbehind is a functional notion, not a formal one.
And yes, it's no big deal.
Copy link to clipboard
Copied
I actually have no horse in this race; I am at best a sort of journeyman GREP user and the chart originated as my effort at sorting out and several references for my own use.* A little evolution and tidying, and it seemed worth passing along with no particular claims or endorsements. IIRC, the \K footnote was changed based on some input by Peter Kahrel.
I can only ask/note that GREP, in the larger universe of systems and coding, seems more extensive and not entirely congruent with the version embedded in InDesign. Is the debate here, and disagreement on places like SE, based on that larger scope and variant function, rather than on how ID implements this function?
* I am, if it's never been made clear, fundamentally a writer and editor first, and a designer-publisher-etc. at least one rung down. It's my nature to absorb, sort out, and present information in what seem like sensible ways.
Copy link to clipboard
Copied
I appreciate your comments @James Gifford—NitroPress and I suspect that you have the type of mind that will appreciate the details here.
> ... that GREP, in the larger universe of systems and coding, seems more extensive and not entirely congruent with the version embedded in InDesign. Is the debate here, and disagreement on places like SE, based on that larger scope and variant function, rather than on how ID implements this function?
Great question! This particular issue—the issue of how to tell people what \K is or does—has nothing to do with the "flavour" of grep (in this case PCRE) although there will be small version differences in different environments. I think the question only arises because of the way Indesign uses the engine, as I speculate in my post, in such a way that a positive lookbehind and \K produce the same results, almost always.
After reading this thread and writing my post I realise that core question for me is: is this the right place to employ a lie-to-children? There is a good case to be made for it given the unusual situation specific to grep in Indesign. I suspect Peter thinks yes; I think no. And I don't care—both positions are perfectly reasonable—it only came up because I had trouble parsing your cheat sheet in that small particular. 🙂
- Mark
Copy link to clipboard
Copied
Hi @Peter Kahrel, this is actually a quite interesting topic—in an incredibly niche area—and I think, as technical writers, philosophy may come into it too.
Of course I am in no way implying that you—Peter—don't know any of this, and this reply to your post is also for James' and any other readers' benefits, and to be honest I find this kind of exercise interesting anyway. And, importantly, the mock-dialogues I use are abstract ideas that are me trying to convey my ideas; they make no reference to yourself, or anything you've said or written—or anybody else. And if I have created a strawman in the first dialog below, please forgive me, it is just for illustration and I did it on purpose.
To finish this preamble, I reiterate what we have both agreed: all this is no big deal! 🙂
The reason it is no big deal is that in Indesign, positive lookbehind and \K both give—almost always—the same results.
The only practical difference that I can see between using \K and a positive lookbehind is that \K effectively allows variable length matches. To make that clear, with a lookbehind you can do this:
(?<=apple|grape)\d+
but not this
(?<=apple|banana)\d+
because the engine doesn't know how long the lookbehind result will be—could be 5, or 6, characters long.
So instead you can do
apple|banana\K\d+
or maybe even
\D+\K\d+
So, that's it. Finished. They both do the same thing in Indesign—so let's call them both "lookbehinds". Okay, fair.
So why do I bother mentioning it? It is philosophical. Consider the following dialogue:
Dialogue between Indesign student and teacher:
Student: "My positive lookbehind failed."
Teacher: "It is because you are defining a variable-length expression in the lookbehind which isn't supported by the grep engine that Indesign uses. You need to use \K instead."
Student: "What is \K?"
Teacher: "It is also a positive lookbehind but it allows you to use a variable length pattern."
Student: "Well why don't we *always* use \K?"
Teacher: "Um, well, yeah okay why not? Let's just use \K whenever we need a postive lookbehind. Good point."
(Teacher is thinking, correctly: there is literally no reason to ever teach (?<= ) to Indesign students. Huh.)
Student: "Okay, but my grep still isn't working."
Teacher: "Oh, you don't need parenthesis or any other symbols. Just use \K after the pattern you want to match."
Student: "Ah got it, that works. So negative lookbehind is (?<!my pattern here) and positive lookbehind is my pattern here\K. Got it."
Teacher: "Um, yeah... Yes."
Now consider making some changes to that dialogue:
Teacher: " ... You need to use \K instead."
Student: "What is \K?"
Teacher: "It causes any matched content up to that point to be discarded. Imagine that each character that is matched is collected in a bucket, one-by-one, but when \K comes along, the bucket is emptied and it will start to fill again if more of the grep—after the \K—is matched."
Student: "So it's like empty the bucket."
Teacher: "Sure! Whatever has been matched is emptied out at that point and is gone. Yeah, we use \K instead of a positive lookbehind because lookbehinds have that limitation that you found earlier."
Student: "Okay got it."
The first dialogue made no attempt at mnemonics, but just for fun, the following addendum to this dialogue is possible:
Student: "Why is it the letter K? That's annoying to remember."
Teacher: "I guess the better symbols were already taken when this feature was added to the grep engine. Many people call it Keep Out which is a bit awkward, but I guess it means keep the bucket results out of the final results."
Student: "Fine."
You will notice that adding that mnemonic exercise to the end of the first dialogue will just confuse the student.
Okay, back to my philosophical point: I much prefer the second dialog because (a) it is factually correct, describing what the PCRE engine actually does, (b) it makes no strong connection to the positive lookbehind (see the sentence I've underlined), and therefore doesn't introduce the additional cognitive load of so we should never use (?<= ) but \K is the same but better; instead it introduces \K as just another symbol that does a distinct operation, and finally (c) this dialogue matches the wider reality, so the hypothetical student could go on to learn, say, perl, and their grep knowledge would already be compatible, assuming the engines were the same.
That's my thoughts on the matter. Sorry I didn't know how to make this post shorter—or didn't have time to make it shorter, haha. As this reply relates specifically to my first reading of James' handy cheat sheet, I will remind you all that my actual experience was to have no idea what "(Inclusive) lookbehind" and the asterisked remark was talking about next to \K. My experience might be atypical, and what you have written may resonate better on average with users. I don't know.
It was quite fun to think about all this, but ... one last time(!) ... no big deal. 🙂
- Mark
P.S. for the visually inclined, here is a comparison character-by-character as the PCRE state machine traverses the string, left-to-right comparing \K to positive lookbehind:
Note: the whole discussion here is a *very* high-level view of the topic. In no way does it reflect lower-level implentation details, which will relate to hideously complicated performance optimization etc.
Edit: very minor typos.
Edit: removed wrong example from my poor memory.
Edit 2025-02-19: improved Comparison diagram for clarity, to show current token being evaluated. Added disclaimer note.
Copy link to clipboard
Copied
Interesting points, Mark.
Student: "Well why don't we *always* use \K?"
Teacher: "Well, we could, and some people do, not only because it's more flexible, but also because it's less typing and you don't have to wonder whether the < comes before the = or the other way around. But another thing, Grasshopper, is that (?<=. . .) has been around since forever while \K is a later addition. Sometime after 2007 or so."
Why the backslash+letter was chosen rather than (?. . .) I'm not sure, it could just as well have been something like (?<==. . .). \K looked more modern maybe (apart from being shorter).
As to your comparison chart, you can phrase thing any way to suit a purpose. For example, the description in the right-hand figure can be recast based on the one in the left-hand figure: "As soon as the (matching) closing parenthesis is activated in the engine, the entire matched contents is discarded."
Well, no big deal. That was my last shot!
P.

