• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

GREP confusion

Community Expert ,
Apr 14, 2024 Apr 14, 2024

Copy link to clipboard

Copied

I'm trying to target the first space in a paragraph (assigning a blue underline for now).

 

Why does this match everything through the first space:

^[^\s]+\s

2024-04-14_12-12-21.png

But adding the lookbehind only works on the first one in the text frame?

^[^\s]+\K\s

2024-04-14_12-14-11.png

What fundamental Regular Expression concept am I missing?

2024-04-14_12-15-51 (1).gif

Thanks in advance, 

~Barb

 

TOPICS
How to

Views

220

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 2 Correct answers

Community Expert , Apr 14, 2024 Apr 14, 2024

GREP is a logical construction. And you might think it's logically looking at the start of each line using ^ but it's a bit deeper and GREP has single line and multiline modes. 

 

By adding (?m) as a flag at the start it works

(?m)^[^\s]+\K\s



EugeneTyson_0-1713153225900.png



(?m): This is a flag that enables multiline mode, allowing ^ to match the start of lines in addition to the start and end of the string.


The expression ^[^\s]+\K\s starts by matching the beginning of the string (^) and then captures the first word on the

...

Votes

Translate

Translate
Community Expert , Apr 15, 2024 Apr 15, 2024

Alright, imagine you have a bunch of lines written down on paper, like a list. Each line has some words on it, and some spaces between those words.

 

This GREP (?m)^[^\s]+\K\s, is like this:

You go through each line, and when you find the first word on that line, you put a special mark after it.

Then, you look for a space after that special mark.

So basically, you're just picking out the space right after the first word on each line.

 

Your GREP:

^[^\s]+\K\s, is a bit simpler.

You look at all th

...

Votes

Translate

Translate
Community Expert ,
Apr 14, 2024 Apr 14, 2024

Copy link to clipboard

Copied

I think it's a bug.

^[^\s ]+\K\s works perfectly in CS6, but in Version 18 (I don't have 19 installed) it finds every second paragraph after the first on my system.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 14, 2024 Apr 14, 2024

Copy link to clipboard

Copied

If 100 of us here were to have a GREP shootout, I'd come in 101st, but my dim sense is that, yes, there's a bug here.

 

I can get '(?<=^Git)[\s ]' to work perfectly as long as I use that hard-coded word, but my GREP-fu can't come up with any combination of wildcards that will find any-first-word there. But other things work in wonky ways that make me think there's a lookahead/behind glitch here.


╟ Word & InDesign to Kindle & EPUB: a Guide to Pro Results (Amazon) ╢

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

So with the best/possible answer in hand, can I get some feedback on my string above, or here—

 

(?<=^Git)[\s ]

 

As noted, it works perfectly to Barb's request except that I cannot get it to select a single first word. I experimented with the full range of wildcards, and—

  • Using a specific, hardcoded word string ('Git') works.
  • Using some number of any-character markers (.) works, for first words with that many letters (although fooled by spaces).
  • Using any number of \w character markers, or a \u\l\l\l... string works.

—but I can't find any combination of variable word length wildcards that will return a hit.

 

Am I missing something, or is this an integral limitation of the lookbehind operation, or part of the apparent bug?


╟ Word & InDesign to Kindle & EPUB: a Guide to Pro Results (Amazon) ╢

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

I wouldn't say I'm a GREP maven, either, but my understanding is look-behind cannot handle variable lengths.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 14, 2024 Apr 14, 2024

Copy link to clipboard

Copied

I just noticed you were trying to do this as a GREP style. I was using Find/Change, so let me see if I get differnt results with the style...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 14, 2024 Apr 14, 2024

Copy link to clipboard

Copied

OK, still works in CS6 as GREP Style, but now only affects first paragraph in story thread in v 18.

 

BUT, if this is to be a GREP style, I question why? Why not use an ordinary nested style?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 14, 2024 Apr 14, 2024

Copy link to clipboard

Copied

Thank you both, Peter and James. 🙂

 

It's just an exercise—a nested style with none up to the first space and my character style applied through the first space works correctly but I'm always trying to refine my GREP skillset and was working through negative character classes in @Peter Kahrel's book. I thought I had made a leap forward with [^\s] only to be completely stymied by the lookbehind. I assumed I was the problem and not InDesign.

 

It's odd we haven't seen anyone else mention it though, if it was happening back in 2018, right? Is it the rare combination of a negative character class and a lookbehind? I tried the old lookbehind syntax with the same result. And I can use lookbehinds elsewhere successfully. That was my first negative character class though. 

 

~Barb

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 14, 2024 Apr 14, 2024

Copy link to clipboard

Copied

GREP is a logical construction. And you might think it's logically looking at the start of each line using ^ but it's a bit deeper and GREP has single line and multiline modes. 

 

By adding (?m) as a flag at the start it works

(?m)^[^\s]+\K\s



EugeneTyson_0-1713153225900.png



(?m): This is a flag that enables multiline mode, allowing ^ to match the start of lines in addition to the start and end of the string.


The expression ^[^\s]+\K\s starts by matching the beginning of the string (^) and then captures the first word on the line ([^\s]+).
The \K sequence resets the starting point of the reported match to the current position, effectively excluding the first word from the match. Finally, \s matches any whitespace character after the first word.

This means that when the regular expression engine encounters a line of text, it matches the whitespace character after the first word. Since it doesn't capture the first word itself, it appears as though the expression is "skipping" the first word and the whitespace immediately following it.

Another reason it may seem like the expression is skipping lines is that it only operates within each line of the text. It doesn't explicitly account for line breaks or consider the content of subsequent lines.
So, after matching the whitespace following the first word on a line, the regular expression engine moves on to the next line without any specific instruction regarding what to do next. This behavior might create the impression of skipping lines, especially if the intention is to process each line individually.

The goal is to process every line so you need to qualify your GREP with an additional instruciton (?m) flag so it has instruction.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

@Eugene Tyson 

 

Just in case - @Barb Binder is working with a TF full of short Single Line Paragraphs - not a Single long Multi Line Paragraph.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

Thank you, Robert and Eugene.

 

Eugene: you are absolutely correct—the flag definitely fixes the problem but I'm still confused. Now please keep in mind—it's early here. I'm still on my 1st cup of coffee. I've read your reply repeatedly and then checked Peter's book. The flag appears on the last page in a table called "modifiers" but also says default

 

2024-04-15_07-27-30.png

 

If you have the time/interest, can you try the explaination one more time in a highly-simplified, "explain it like I'm five" sentence. This question is about learning and understanding, and I'm still confused as to why the negative character class successfully worked on each new pararaph, but adding the lookbehind required the additional instruction to behave the same way. 

 

Either way, I'm grateful for your knowledge and your answer, and to all of you for helping. It takes a village!

 

~Barb

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

Alright, imagine you have a bunch of lines written down on paper, like a list. Each line has some words on it, and some spaces between those words.

 

This GREP (?m)^[^\s]+\K\s, is like this:

You go through each line, and when you find the first word on that line, you put a special mark after it.

Then, you look for a space after that special mark.

So basically, you're just picking out the space right after the first word on each line.

 

Your GREP:

^[^\s]+\K\s, is a bit simpler.

You look at all the words on all the lines together, not one line at a time.

You find the first word in the whole bunch, put a special mark after it, and then look for a space after that mark. So here, you're just finding the space right after the very first word in the whole list.

 

So the difference is like looking at each line separately for the first one

and looking at all the lines together for the second one

 

Hope that is clearer

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

quote

Eugene: you are absolutely correct—the flag definitely fixes the problem


By @Barb Binder

 

Well, not really. It only fixes it if you have single line paragraphs, but not if you have multiline becauase now it will find the first space on every line, not just the first in the paragraph.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

Hi @Peter Spier :

 

With and without the flag, single and multi-line paragraphs. I'm only targeting the first space—so it's working for both. Unless I am missing something. 

 

~Barb2024-04-15_14-02-59 (1).gif

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

Nope, I'm the one missing something.

Still think there's a bug in here someplace and it would be great if Peter Kahrel weighed in.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 16, 2024 Apr 16, 2024

Copy link to clipboard

Copied

LATEST

Yeh, I'd be happy for someone else to weigh in to explain it better than I am explaining it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

Well if given exact paragraphs we can refine it further.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

I'm good, Eugene!  🙂

 

~Barb

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

Thank you!!! 

 

~Barb

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines