Skip to main content
Inspiring
January 25, 2020
Answered

GREP to catch all lines/paragraphs with title case?

  • January 25, 2020
  • 1 reply
  • 1526 views

So I found this grep query to find a string of two or more words starting with a capital letter in another thread posted by Jongware, I modified it slightly so that it wouldn't catch strings of words with a soft return or hard return in between \b\u\w+( +\u\w+)+

I want to change it to to catch a string of however many words in title case followed by a break with no uncapitalised words in between.


see here I want it to get only "Banana Sour Cream Bread" because it's followed by a return (\r) but not "Another Banana Bread" because that part is inside the body text, and is followed by more uncapitalised words


furthermore, i'd like to omit a string of up to one or two uncapitalised words in between, for those few little words that generally escape title casing.

so here 'and' and 'pony' are both uncapitalised but because they are amongst otherwise all words that start with a capital, I want it to still catch the whole thing (up until the break)


Also, I want it to catch single title case words if starting on a new line and followed by a break

This code here works^\u\w+\r I don't know if this can be inserted into the above query but I'm happy to string it together in a script if not

Many thanks.

This topic has been closed for replies.
Correct answer Jongware

Catching either one or more capitalized words is perfectly do-able!

^\u\S*( \u\S*)*$

 

This starts at the start of a paragraph (^), matches a capital followed by zero or more not-a-space characters¹ and then optionally followed by more sets of "space, capital, any not-a-space character".

 

Ignoring one or two uncapitalized words is ... possible, but not pretty! This works for your sample text:

^\u\S*( \u\S*)*( \S*)?( \u\S*)*( \S*)?( \u\S*)*$

 

It's an expansion of the above expression, allowing a sequence of any not-a-space in between the (still all optional!) capitalized phrases, and the whole thing repeated twice to cater for zero, one or two uncapitalized sequences.

 

¹Zero or more, following that initial capital, so it can catch "I" (and "U" and "R"). Also, "not-a-space" \S rather than "a word character" \w, so it can match the occasional hyphen, comma, exclamation mark and other possible interjections.

1 reply

Jongware
Community Expert
JongwareCommunity ExpertCorrect answer
Community Expert
January 25, 2020

Catching either one or more capitalized words is perfectly do-able!

^\u\S*( \u\S*)*$

 

This starts at the start of a paragraph (^), matches a capital followed by zero or more not-a-space characters¹ and then optionally followed by more sets of "space, capital, any not-a-space character".

 

Ignoring one or two uncapitalized words is ... possible, but not pretty! This works for your sample text:

^\u\S*( \u\S*)*( \S*)?( \u\S*)*( \S*)?( \u\S*)*$

 

It's an expansion of the above expression, allowing a sequence of any not-a-space in between the (still all optional!) capitalized phrases, and the whole thing repeated twice to cater for zero, one or two uncapitalized sequences.

 

¹Zero or more, following that initial capital, so it can catch "I" (and "U" and "R"). Also, "not-a-space" \S rather than "a word character" \w, so it can match the occasional hyphen, comma, exclamation mark and other possible interjections.

Inspiring
January 26, 2020

Ah, that's awesome thanks Jongware.... I tested it on a small sample and it seems to be working exactly as intended. I knew the second part would be a bit iffy but this will work very well as a starter!

I get alot of word documents in my work with all sorts of random formatting and headings all over the place. I have to go through and fix it all up and this will speed things up immensly 🙂

Thanks also for the clarification around what it does and the distinction between \w word characters and \S not a space....
it's definitely more useful for what I need it for!