Skip to main content
dublove
Legend
May 7, 2025
Answered

Is [[:punct:]] the only representation of punctuation proper?

  • May 7, 2025
  • 2 replies
  • 1475 views

Is [[:punct:]] the only representation of punctuation proper?
Is there a simpler one?

 

Correct answer Eugene Tyson

You can build your own punctuation character set like this:

[.,!?;:'"(){}[\]<>/@#$%^&*+=_|\\-]

This is messy, prone to errors (you have to remember to escape ] and \?), and incomplete, depending on the characters you remember to include. Plus, it's easy to forget obscure punctuation like ¿, ¡, „, etc.

 

Another way is to use something like

\W (non-word character)
\W matches any character that’s not a letter, digit or underscore, so it includes punctuation but also:

spaces

tabs

symbols

line breaks

 

So you technically try [\W\D\H]

That would ignore words digits and horizontal spaces (not tested ... don't fret if it doesn't work I'm not on my computer).

 

So the POSIX for punct is the most comprehensive way.

 

Why do you ask?

2 replies

Peter Kahrel
Community Expert
Community Expert
May 7, 2025

There's also \p{Punctuation}

Community Expert
May 8, 2025

 

Was trying that as \p{P} but didn't expand it to the full word! 
AMAZING! And probably catches more things than the the posix - is that right, like smart quotes?

 

What else is there in the hidden gems of GREP? 

 

Seriously!

 

 

Joel Cherney
Community Expert
Community Expert
May 8, 2025

I hate to disappoint you, Eugene. Unfortunately, I believe that \p{Punctuation} captures exactly what [[:punct:]] captures. I grabbed the text of this handy list of Supposedly All Unicode Punctuation, dropped the text into InDesign, and used Change All to figure out how many punctuation glyphs would be found.

 

\p{Punctuation} found 1672 glyphs

[[:punct:]] found 1672 glyphs

 

Seems to me that they capture the exact same glyphs. 

 

Also, @dublove, that might be a feature request you'd like to file: InDesign GREP should support abbreviated Unicode categories. There are a bunch of them, many quite useful, but they'd all be easier to use in InDesign if we could just e.g. type \p{Pc} instead of \p{Connector_Punctuation}

 

Eugene TysonCommunity ExpertCorrect answer
Community Expert
May 7, 2025

You can build your own punctuation character set like this:

[.,!?;:'"(){}[\]<>/@#$%^&*+=_|\\-]

This is messy, prone to errors (you have to remember to escape ] and \?), and incomplete, depending on the characters you remember to include. Plus, it's easy to forget obscure punctuation like ¿, ¡, „, etc.

 

Another way is to use something like

\W (non-word character)
\W matches any character that’s not a letter, digit or underscore, so it includes punctuation but also:

spaces

tabs

symbols

line breaks

 

So you technically try [\W\D\H]

That would ignore words digits and horizontal spaces (not tested ... don't fret if it doesn't work I'm not on my computer).

 

So the POSIX for punct is the most comprehensive way.

 

Why do you ask?

dublove
dubloveAuthor
Legend
May 8, 2025

I just want to ask for a simple representation.