Copy link to clipboard
Copied
Is [[:punct:]] the only representation of punctuation proper?
Is there a simpler one?
You can build your own punctuation character set like this:
[.,!?;:'"(){}[\]<>/@#$%^&*+=_|\\-]
This is messy, prone to errors (you have to remember to escape ] and \?), and incomplete, depending on the characters you remember to include. Plus, it's easy to forget obscure punctuation like ¿, ¡, „, etc.
Another way is to use something like
\W (non-word character)
\W matches any character that’s not a letter, digit or underscore, so it includes punctuation but also:
spaces
tabs
symbols
line breaks
...Copy link to clipboard
Copied
You can build your own punctuation character set like this:
[.,!?;:'"(){}[\]<>/@#$%^&*+=_|\\-]
This is messy, prone to errors (you have to remember to escape ] and \?), and incomplete, depending on the characters you remember to include. Plus, it's easy to forget obscure punctuation like ¿, ¡, „, etc.
Another way is to use something like
\W (non-word character)
\W matches any character that’s not a letter, digit or underscore, so it includes punctuation but also:
spaces
tabs
symbols
line breaks
So you technically try [\W\D\H]
That would ignore words digits and horizontal spaces (not tested ... don't fret if it doesn't work I'm not on my computer).
So the POSIX for punct is the most comprehensive way.
Why do you ask?
Copy link to clipboard
Copied
I just want to ask for a simple representation.
Copy link to clipboard
Copied
Well, if there's no single letter to represent it, then we'll settle for [\W\D\H].
Copy link to clipboard
Copied
Just noticed it catches paragraph returns
So to make it better
[^\w\d\r\n\h]
Similar to [\W\D\H]
where it's using a range [ ]
and looking for non word characters \W instead of \w - and same for \d finds digits \D finds anything but digits etc.
Here we use the same logic
but the ^ excludes the items from the search.
So you have the range [ ]
[^ ] range of non inclusion
[^\w] don't include word characters etc
Copy link to clipboard
Copied
Ah now I figure out that
[^w\s]
does the same thing
But also catches mathematical symbols
Where this doesn't
\p{Punctuation}
this also works
\p{Punct}
It really depends on your needs GREP is GREP - it finds what you ask for it's very precise and doesn't guess.
Copy link to clipboard
Copied
\p{Punct}
New knowledge.
It's [^\w\s]right?
Copy link to clipboard
Copied
Copy link to clipboard
Copied
You can always grab @Joel Cherney list in the link and test it.
I am always in favor of testing!
Seems like accents are neither word characters nor punctuation. There are other exceptions as well - currency symols, box drawing characters, arrows, and so on.
Copy link to clipboard
Copied
There's also \p{Punctuation}
Copy link to clipboard
Copied
Was trying that as \p{P} but didn't expand it to the full word!
AMAZING! And probably catches more things than the the posix - is that right, like smart quotes?
What else is there in the hidden gems of GREP?
Seriously!
Copy link to clipboard
Copied
I hate to disappoint you, Eugene. Unfortunately, I believe that \p{Punctuation} captures exactly what [[:punct:]] captures. I grabbed the text of this handy list of Supposedly All Unicode Punctuation, dropped the text into InDesign, and used Change All to figure out how many punctuation glyphs would be found.
\p{Punctuation} found 1672 glyphs
[[:punct:]] found 1672 glyphs
Seems to me that they capture the exact same glyphs.
Also, @dublove, that might be a feature request you'd like to file: InDesign GREP should support abbreviated Unicode categories. There are a bunch of them, many quite useful, but they'd all be easier to use in InDesign if we could just e.g. type \p{Pc} instead of \p{Connector_Punctuation}
Copy link to clipboard
Copied
Amazing! This opens up so many possibilities.
I knew about it in Regex but they didn't work in the past, typing out the full word, that's disappointing but opens up a lot of workarounds!
Amazing!
Copy link to clipboard
Copied
The abbreviated forms, too, work in InDesign. Unfortunately they're not documented (as so many other Grep things), but they're all in this script:
https://creativepro.com/files/kahrel/indesign/grep_classes.html
Copy link to clipboard
Copied
Hi Peter Kahrel ~
I saw this great thing of yours the other day.
I think though, wouldn't it be a little better if it stored the user's Grep.
Because I think it would be less likely to lose the Grep in the script.
Because scripts don't need to be on the C drive, it supports "directory.link".
Copy link to clipboard
Copied
> a little better if it stored the user's Grep
> it would be less likely to lose the Grep in the script.
> scripts don't need to be on the C drive, it supports "directory.link".
The purpose of the script is simple: it lets you look up and insert Grep codes. That's all.
What do you mean by these comments? I can't make much sense of them.
Copy link to clipboard
Copied
The \p{P} abbreviation doesn't work for me in the Find what: field in the GREP tab of Find/Change dialog. My hunch is that a) it's a bug in the F/C dialog that b) you may not have noticed because c) you probably never key regular expressions directly into the F/C dialog, right? I suspect that the abbreviations work in ExtendScript just fine.
Copy link to clipboard
Copied
The short form is \p{P*} -- the addition of the asterisk is a feature of Boost's regex libraries, which InDesign's Grep is based on.
> you probably never key regular expressions directly into the F/C dialog, right?
I do that all the time!
> I suspect that the abbreviations work in ExtendScript just fine.
JavaScript's (and therefore ExtendScript's) regular expressions are very basic, it doesn't know about Unicode classes.
Copy link to clipboard
Copied
1000x thanks for this. I am completely unashamed of tripping over Boost vs PCRE (again) but I should not have a hard time remembering that it's still 1999 as far as ExtendScript is concerned.