Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Exclude certain characters from posix [[:punct:]] search

Explorer ,
Feb 27, 2023 Feb 27, 2023

I'm cleaning up some text where often the space was omitted after punctuation. Like this:

There should be a space after a comma,or a semicolon,for example;it's often missing (but not after opening parentheses or apostrophes).

 

The problem is that when using a Grep Posix search for [[:punct:]](?=\w) it finds all punctuation. Is there a way to exclude ( and ' from the [[:punct:]] wildcard?

 

TOPICS
How to , Scripting , Type
1.2K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 27, 2023 Feb 27, 2023

Hi @defaultu0e43kqloi6n, well here's a straightforward (warning: probably naive!) approach:

 

([!%&*+,-.:;?)}\]])(\S)

 

Edit: I removed some unnecessary escaping inside the [ ], thanks to @FRIdNGE's help.

These are the characters that would seem to me to be the problems in ordinary latin text. Because I excluded apostrophe, it won't pick up trailing single quotes with no space. eg. this won't be caught: ‘The quick brown fox’The next sentence.

Be careful when copy/pasting the grep string above—Indesign or your OS can remove escape characters sometimes.

 

ChangeTo:

 

$1 $2

 

To add a space between the punctuation and the non-space character.

 

If you want to change the list of punctuation, just add or remove from between the first [ and the last ].

- Mark

Screenshot 2023-02-28 at 10.45.09.png

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Feb 27, 2023 Feb 27, 2023

Enough:

 

[!%&*+,-.:;?)}\]]

 

(^/)  The Jedi

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 27, 2023 Feb 27, 2023

Ah, we don't have to escape characters inside [ ] except ] of course! Thanks!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Feb 27, 2023 Feb 27, 2023

Thanks, I've been using something similar, just fishing for whether there was possibly something out there I didn't know about customizing posix.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 27, 2023 Feb 27, 2023

Yeah I had a look but I couldn't find anything. The POSIX character classes such as [:punct:] don't seem to be editable.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Mar 08, 2023 Mar 08, 2023

Use negative lookahead

(?![\x{0027}\x{0028}])[[:punct:]]

Negative lookahead is successful if can not match to the right, so in this case is successful if the character on the right is not apostrophe or left parenthesis. Then search for punctuation.

Search any word character followed by punctuation, except ' or (:

(\w)((?![\x{0027}\x{0028}])([[:punct:]]))

Change:

$1 $2

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 08, 2023 Mar 08, 2023
LATEST

Cool! Thanks.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines