Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Is [[:punct:]] the only representation of punctuation proper?

Guide ,
May 07, 2025 May 07, 2025

Is [[:punct:]] the only representation of punctuation proper?
Is there a simpler one?

 

TOPICS
Bug , Feature request , How to
416
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , May 07, 2025 May 07, 2025

You can build your own punctuation character set like this:

[.,!?;:'"(){}[\]<>/@#$%^&*+=_|\\-]

This is messy, prone to errors (you have to remember to escape ] and \?), and incomplete, depending on the characters you remember to include. Plus, it's easy to forget obscure punctuation like ¿, ¡, „, etc.

 

Another way is to use something like

\W (non-word character)
\W matches any character that’s not a letter, digit or underscore, so it includes punctuation but also:

spaces

tabs

symbols

line breaks

...
Translate
Community Expert ,
May 07, 2025 May 07, 2025

You can build your own punctuation character set like this:

[.,!?;:'"(){}[\]<>/@#$%^&*+=_|\\-]

This is messy, prone to errors (you have to remember to escape ] and \?), and incomplete, depending on the characters you remember to include. Plus, it's easy to forget obscure punctuation like ¿, ¡, „, etc.

 

Another way is to use something like

\W (non-word character)
\W matches any character that’s not a letter, digit or underscore, so it includes punctuation but also:

spaces

tabs

symbols

line breaks

 

So you technically try [\W\D\H]

That would ignore words digits and horizontal spaces (not tested ... don't fret if it doesn't work I'm not on my computer).

 

So the POSIX for punct is the most comprehensive way.

 

Why do you ask?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
May 07, 2025 May 07, 2025

I just want to ask for a simple representation.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
May 08, 2025 May 08, 2025

Well, if there's no single letter to represent it, then we'll settle for [\W\D\H].

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2025 May 08, 2025

Just noticed it catches paragraph returns

 

So to make it better

[^\w\d\r\n\h]

 

Similar to [\W\D\H]

where it's using a range [ ] 

and looking for non word characters \W instead of \w - and same for \d finds digits \D finds anything but digits etc.

 

Here we use the same logic

but the ^ excludes the items from the search. 

 

So you have the range [ ]

[^ ] range of non inclusion 

[^\w] don't include word characters  etc 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2025 May 08, 2025

Ah now I figure out that
[^w\s]
does the same thing 
But also catches mathematical symbols 

 

Where this doesn't

\p{Punctuation}


this also works

\p{Punct}

 

It really depends on your needs GREP is GREP - it finds what you ask for it's very precise and doesn't guess.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
May 08, 2025 May 08, 2025

\p{Punct}

New knowledge.
It's [^\w\s]right?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2025 May 08, 2025

You can always grab @Joel Cherney list in the link and test it. 

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2025 May 08, 2025

You can always grab @Joel Cherney list in the link and test it. 

 

I am always in favor of testing!

 

accc.gif

 

Seems like accents are neither word characters nor punctuation.  There are other exceptions as well - currency symols, box drawing characters, arrows, and so on. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 07, 2025 May 07, 2025

There's also \p{Punctuation}

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 07, 2025 May 07, 2025

 

Was trying that as \p{P} but didn't expand it to the full word! 
AMAZING! And probably catches more things than the the posix - is that right, like smart quotes?

 

What else is there in the hidden gems of GREP? 

 

Seriously!

 

EugeneTyson_1-1746675607486.gif

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 07, 2025 May 07, 2025

I hate to disappoint you, Eugene. Unfortunately, I believe that \p{Punctuation} captures exactly what [[:punct:]] captures. I grabbed the text of this handy list of Supposedly All Unicode Punctuation, dropped the text into InDesign, and used Change All to figure out how many punctuation glyphs would be found.

 

\p{Punctuation} found 1672 glyphs

[[:punct:]] found 1672 glyphs

 

Seems to me that they capture the exact same glyphs. 

 

Also, @dublove, that might be a feature request you'd like to file: InDesign GREP should support abbreviated Unicode categories. There are a bunch of them, many quite useful, but they'd all be easier to use in InDesign if we could just e.g. type \p{Pc} instead of \p{Connector_Punctuation}

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 07, 2025 May 07, 2025

Amazing! This opens up so many possibilities. 

I knew about it in Regex but they didn't work in the past, typing out the full word, that's disappointing but opens up a lot of workarounds! 

 

Amazing!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 07, 2025 May 07, 2025

The abbreviated forms, too, work in InDesign. Unfortunately they're not documented (as so many other Grep things), but they're all in this script:

https://creativepro.com/files/kahrel/indesign/grep_classes.html

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
May 07, 2025 May 07, 2025

Hi  Peter Kahrel ~

I saw this great thing of yours the other day.
I think though, wouldn't it be a little better if it stored the user's Grep.
Because I think it would be less likely to lose the Grep in the script.
Because scripts don't need to be on the C drive, it supports "directory.link".

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2025 May 08, 2025

> a little better if it stored the user's Grep

it would be less likely to lose the Grep in the script.

scripts don't need to be on the C drive, it supports "directory.link".

 

The purpose of the script is simple: it lets you look up and insert Grep codes. That's all.

 

What do you mean by these comments? I can't make much sense of them.

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2025 May 08, 2025

The \p{P} abbreviation doesn't work for me in the Find what: field in the  GREP tab of Find/Change dialog. My hunch is that a) it's a bug in the F/C dialog that b) you may not have noticed because c)  you probably never key regular expressions directly into the F/C dialog, right? I suspect that the abbreviations work in ExtendScript just fine. 

 

punct.gif

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2025 May 08, 2025

The short form is \p{P*} -- the addition of the asterisk is a feature of Boost's regex libraries, which InDesign's Grep is based on.

 

you probably never key regular expressions directly into the F/C dialog, right?

 

I do that all the time!

 

> I suspect that the abbreviations work in ExtendScript just fine.

 

JavaScript's (and therefore ExtendScript's) regular expressions are very basic, it doesn't know about Unicode classes.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 08, 2025 May 08, 2025
LATEST

1000x thanks for this. I am completely unashamed of tripping over Boost vs PCRE (again) but I should not have a hard time remembering that it's still 1999 as far as ExtendScript is concerned. 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines