Skip to main content
Known Participant
February 27, 2023
Question

Exclude certain characters from posix [[:punct:]] search

  • February 27, 2023
  • 2 replies
  • 1336 views

I'm cleaning up some text where often the space was omitted after punctuation. Like this:

There should be a space after a comma,or a semicolon,for example;it's often missing (but not after opening parentheses or apostrophes).

 

The problem is that when using a Grep Posix search for [[:punct:]](?=\w) it finds all punctuation. Is there a way to exclude ( and ' from the [[:punct:]] wildcard?

 

This topic has been closed for replies.

2 replies

Inspiring
March 8, 2023

Use negative lookahead

(?![\x{0027}\x{0028}])[[:punct:]]

Negative lookahead is successful if can not match to the right, so in this case is successful if the character on the right is not apostrophe or left parenthesis. Then search for punctuation.

Search any word character followed by punctuation, except ' or (:

(\w)((?![\x{0027}\x{0028}])([[:punct:]]))

Change:

$1 $2

m1b
Community Expert
Community Expert
March 8, 2023

Cool! Thanks.

m1b
Community Expert
Community Expert
February 27, 2023

Hi @defaultu0e43kqloi6n, well here's a straightforward (warning: probably naive!) approach:

 

([!%&*+,-.:;?)}\]])(\S)

 

Edit: I removed some unnecessary escaping inside the [ ], thanks to @FRIdNGE's help.

These are the characters that would seem to me to be the problems in ordinary latin text. Because I excluded apostrophe, it won't pick up trailing single quotes with no space. eg. this won't be caught: ‘The quick brown fox’The next sentence.

Be careful when copy/pasting the grep string above—Indesign or your OS can remove escape characters sometimes.

 

ChangeTo:

 

$1 $2

 

To add a space between the punctuation and the non-space character.

 

If you want to change the list of punctuation, just add or remove from between the first [ and the last ].

- Mark

FRIdNGE
February 27, 2023

Enough:

 

[!%&*+,-.:;?)}\]]

 

(^/)  The Jedi

m1b
Community Expert
Community Expert
February 27, 2023

Ah, we don't have to escape characters inside [ ] except ] of course! Thanks!