Copy link to clipboard
Copied
This is the information I have in hand:
==== Sample Data
Heading: This is heading one and contains info about blah blah. Heading 2: This is heading number two which talks about blah blah. Heading 3: Yet another heading. Heading 4: One more heading for so and so. Heading: Again one heading. Heading infinity: The is never ending heading.
==== Sample Data
This is the grep code I m using:
\w+(?:\s\w+)*\s?:
The code is working fine, however, I am not able to understand the code. The code was generated by me using help from someone. Can any one help explaining? The code select all the headings (be it single word or multiple words) including ":". Whats confusing me is the non capturing group with \w+ within paranthesis.
Any alternative code would also be appreciated.
Thanks
\w+ means match one or more word characters. Word characters are the digits 0-9, the underscore character, and letters.
(?:\s\w+)* match any (i.e. zero or more) instances of a space followed by one or more word characters. The grouping is applied so that the operator * applies to \s\w+. The ?: isn't strictly necessary, it makes the GREP expression more efficient. Grouping using parentheses forces InDesign to create a referent (so that you can refer to it later) but that's an effort. With ?: you
...Copy link to clipboard
Copied
The non capturing group seems to help in selecting the heading that are split over multiple lines via a hard/soft return.
-Manan
Copy link to clipboard
Copied
Hi Manan
Thanks for the reply.
The \s after ?: is non capturing or both \s\w+ are non capturing ? And why an * outside the paranthesis ?
Copy link to clipboard
Copied
You already have had some great explanations, I would like to emphasise on something that might create confusions and that is just because of the content we are searching
http://www.rexegg.com/regex-quantifiers.html
I hope this clears out any confusions that you might still had
P.S. :- Do note that somethings you find on the internet might not work in InDesign. It all depends upon the regex engine implemented by InDesign. However, the basics remain the same and function more or less consistently
-Manan
Copy link to clipboard
Copied
Hi Manan
Thank you so much for the explanation.
Peter's explanation made it clear.
Thanks once again.
Copy link to clipboard
Copied
There are usually many different ways to get where you want to go with Grep. Are you only interested in understanding this Grep? Or does it not find all or too many occurrences?
One possibility would be (if you have no more colons in the headings)
[^:]*:
Finds everything up to the colon and additionally the colon.
Perhaps the nested formats in the paragraph style would also help you. But for that you would have to explain in more detail what exactly you want to achieve.
Copy link to clipboard
Copied
Thanks for your help pixxxelschubser
Copy link to clipboard
Copied
\w+ means match one or more word characters. Word characters are the digits 0-9, the underscore character, and letters.
(?:\s\w+)* match any (i.e. zero or more) instances of a space followed by one or more word characters. The grouping is applied so that the operator * applies to \s\w+. The ?: isn't strictly necessary, it makes the GREP expression more efficient. Grouping using parentheses forces InDesign to create a referent (so that you can refer to it later) but that's an effort. With ?: you prevent the creation of the referent.
\s?: One or zero spaces and a colon.
Peter
Copy link to clipboard
Copied
Thank you so much Peter.
Your explanation made it crystal clear. Actually ?: inside the group was confusing me..
Thanks a lot for your help.