Trying to Use GREP To Identify Beginning of a Paragraph

Report · Nov 06, 2012

I want to apply a character style to the first sentence in a paragraph. The challenge is these are not always traditional sentences. They can begin with a decimal point or a number. Here are a few examples.

7.25” diamond accent bracelet. In sterling silver. Reg. $340
Would want the character style only applied to "7.25” diamond accent bracelet."

.04 ct. tw. diamond bangle bracelet by Brilliance. In sterling silver. Reg. $720

Would want the character style only applied to "04 ct. tw. diamond bangle bracelet by Brilliance."

Diamond accent earrings. In sterling silver. Reg. $120

Would want the character style only applied to "Diamond accent earrings."

I've used [\l\u\d\.], but that applies the character style to too much of the text. I've added the ^ to the beginning of the code, but that did not accomplish what I am trying to do, either.

Thanks in advance for any ideas on this.

Report · Nov 06, 2012

Hi,

Maybe not so accurate but you could try:

^.+?(?<=\w{3}\.)

where {n} is to set a minimum number of inside_word_signs which is ending match if followed by a dot.

hope...

Report · Nov 06, 2012

Or this one: ^.+?\.(?=\s\u)

(from the start of the paragraph up to the first dot that's followed by a space an a capital).

Peter

Report · Nov 08, 2012

Thanks for your help - this is a big timesaver.

Just so I can learn from this, I'm trying to break down what Peter wrote:

^ means start at the beginning of the paragraph

.+ means find a period one or more times

? means find a period zero or one tim

how does the .+? express work? What is it looking for

\. is a period

(?=\s\u) is this expression finding any any whitespace followed by a capital letter.zero or one times?

I appreciate the solution, if you have time please share what it is doing. I am using Peter's "GREP in InDesign CS3/4" as a learning tool, but am fuzzy on the .+? combination.

Thanks again.

Report · Nov 08, 2012

actualy it means find any sequence of characters (.), repeated as many times as posible, but at least one (+?), that starts at the begining of the paragraph (^), folowed by a dot (\.). the second part is called a positive lookahead, and it means that the expresion you are looking for (^.+?\u) is valid only if the conditon in the lookahead is true: (?= means 'begin positive lookahead', \s\u is the expression to validate (any whitespace followed by a capital letter), and the end of the lookahed )

Sometimes, to understand regular expressions it is easyer to read them from right to left.

Some very common and usefull expresion 'parts':

.+? or .*? -> find the longest seqence of characters;

.+ or .* -> find the shortest one;

this is the tutorial i'we used to learn regular expresions:

http://www.regular-expressions.info/tutorial.html

Report · Nov 09, 2012

Your last two statements are wrong, Vamitul:

.+ and .* mean find the longest string of characters

.+? and .*? mean find the shortest string of characters

Peter

Report · Nov 09, 2012

yep.. i got overexcited..

also my first statement is wrong:repeated as many times as posible, but at least one. is actualy as few times as possible

sorry

Report · Nov 09, 2012

kmc,

Read the beginning of the section "Matching Text Between Codes and Certain Characters" on pp. 21-22 on .+? and an alternative.

Peter

Report · Nov 09, 2012

kmc27 wrote:
Just so I can learn from this, I'm trying to break down what Peter wrote:

http://www.jongware.com/idgrephelp.html

Trying to Use GREP To Identify Beginning of a Paragraph

1 Correct answer