GREP to catch all lines/paragraphs with title case?

Question

So I found this grep query to find a string of two or more words starting with a capital letter in another thread posted by Jongware, I modified it slightly so that it wouldn't catch strings of words with a soft return or hard return in between \b\u\w+( +\u\w+)+

I want to change it to to catch a string of however many words in title case followed by a break with no uncapitalised words in between.

see here I want it to get only "Banana Sour Cream Bread" because it's followed by a return (\r) but not "Another Banana Bread" because that part is inside the body text, and is followed by more uncapitalised words

furthermore, i'd like to omit a string of up to one or two uncapitalised words in between, for those few little words that generally escape title casing.

so here 'and' and 'pony' are both uncapitalised but because they are amongst otherwise all words that start with a capital, I want it to still catch the whole thing (up until the break)

Also, I want it to catch single title case words if starting on a new line and followed by a break

This code here works^\u\w+\r I don't know if this can be inserted into the above query but I'm happy to string it together in a script if not

Many thanks.

Jongware · Accepted Answer

Catching either one or more capitalized words is perfectly do-able!

^\u\S*( \u\S*)*$

This starts at the start of a paragraph (^), matches a capital followed by zero or more not-a-space characters¹ and then optionally followed by more sets of "space, capital, any not-a-space character".

Ignoring one or two uncapitalized words is ... possible, but not pretty! This works for your sample text:

^\u\S*( \u\S*)*( \S*)?( \u\S*)*( \S*)?( \u\S*)*$

It's an expansion of the above expression, allowing a sequence of any not-a-space in between the (still all optional!) capitalized phrases, and the whole thing repeated twice to cater for zero, one or two uncapitalized sequences.

¹Zero or more, following that initial capital, so it can catch "I" (and "U" and "R"). Also, "not-a-space" \S rather than "a word character" \w, so it can match the occasional hyphen, comma, exclamation mark and other possible interjections.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded