Skip to main content
dublove
Legend
May 7, 2025
Answered

Is [[:punct:]] the only representation of punctuation proper?

  • May 7, 2025
  • 2 replies
  • 1493 views

Is [[:punct:]] the only representation of punctuation proper?
Is there a simpler one?

 

Correct answer Eugene Tyson

You can build your own punctuation character set like this:

[.,!?;:'"(){}[\]<>/@#$%^&*+=_|\\-]

This is messy, prone to errors (you have to remember to escape ] and \?), and incomplete, depending on the characters you remember to include. Plus, it's easy to forget obscure punctuation like ¿, ¡, „, etc.

 

Another way is to use something like

\W (non-word character)
\W matches any character that’s not a letter, digit or underscore, so it includes punctuation but also:

spaces

tabs

symbols

line breaks

 

So you technically try [\W\D\H]

That would ignore words digits and horizontal spaces (not tested ... don't fret if it doesn't work I'm not on my computer).

 

So the POSIX for punct is the most comprehensive way.

 

Why do you ask?

2 replies

Peter Kahrel
Community Expert
Community Expert
May 7, 2025

There's also \p{Punctuation}

Community Expert
May 8, 2025

 

Was trying that as \p{P} but didn't expand it to the full word! 
AMAZING! And probably catches more things than the the posix - is that right, like smart quotes?

 

What else is there in the hidden gems of GREP? 

 

Seriously!

 

 

Peter Kahrel
Community Expert
Community Expert
May 8, 2025

The \p{P} abbreviation doesn't work for me in the Find what: field in the  GREP tab of Find/Change dialog. My hunch is that a) it's a bug in the F/C dialog that b) you may not have noticed because c)  you probably never key regular expressions directly into the F/C dialog, right? I suspect that the abbreviations work in ExtendScript just fine. 

 


The short form is \p{P*} -- the addition of the asterisk is a feature of Boost's regex libraries, which InDesign's Grep is based on.

 

you probably never key regular expressions directly into the F/C dialog, right?

 

I do that all the time!

 

> I suspect that the abbreviations work in ExtendScript just fine.

 

JavaScript's (and therefore ExtendScript's) regular expressions are very basic, it doesn't know about Unicode classes.

Eugene TysonCommunity ExpertCorrect answer
Community Expert
May 7, 2025

You can build your own punctuation character set like this:

[.,!?;:'"(){}[\]<>/@#$%^&*+=_|\\-]

This is messy, prone to errors (you have to remember to escape ] and \?), and incomplete, depending on the characters you remember to include. Plus, it's easy to forget obscure punctuation like ¿, ¡, „, etc.

 

Another way is to use something like

\W (non-word character)
\W matches any character that’s not a letter, digit or underscore, so it includes punctuation but also:

spaces

tabs

symbols

line breaks

 

So you technically try [\W\D\H]

That would ignore words digits and horizontal spaces (not tested ... don't fret if it doesn't work I'm not on my computer).

 

So the POSIX for punct is the most comprehensive way.

 

Why do you ask?

dublove
dubloveAuthor
Legend
May 8, 2025

I just want to ask for a simple representation.