GREP script help: selecting only certain quotation-marks within a paragraph?

Report · Dec 15, 2022

I am editing a huge book, using the text of the Bible, and the final "look" is one where paragraph breaks have been removed from the source text, so that each 'reading' flows as one paragraph. So, in the source text, sometimes text within quotations marks breaks into different paragraphs, so a new "open" quotation mark appears at the beginning of each paragraph-break, even though it is all the same "speech".

So, in the source text it's like:

And he said, “lorem ipsum lorem ipsum,

lorem impsum.

“lorem impsum lorem ipsum.

“lorem impsum lorem ipsum.”

So, when I batch removed all paragraph breaks, it now looks like,

And he said, “lorem ipsum lorem ipsum, lorem impsum. “lorem impsum lorem ipsum. “lorem impsum lorem ipsum.”

and the first THREE quotation-marks are all "open" and the last one is of course closed.

What I am looking for help for is:

What is a GREP script (if any) by which I can select all “ marks that come after the first “ within a paragraph, but before the ”?

And then I can run the script and delete them all, so the final result is simply,

“Lorem ipsum lorem.... ipsum lorem impsum.”

with just normal one-open and one-close quotations.

Does that make sense?

Thanks for any help!

Report · Dec 15, 2022

There's probably a more intelligent way to do this, but one approach is this:

Find: “\K.+?”
Replace: leave empty
Change format: strikethrough (or whichever format)

This applies a format to all opening quotes between an opening quote and the first next closing quote.

Now do this:

Find: “
Replace: leave empty
Find format: strikethrough (or whichever format you used previously)

This removes all opening quotes in the format you used.

Finally, remove all strikethrus:

Find: leave empty
Replace: leave empty
Find format: strikethrough (or whichever format you used previously)
Change format: -strikethrough

Report · Dec 15, 2022

I like the work-around idea of formatting with the strikethrough, etc.

but the actual code:

“\K.+?”

didn't select the sections I am trying to select. It is just selecting all (“text text text”) inside a paragraph. But I want to select only in-paragraph chunks that have multiple open-quotes in a row: “text “text “text” etc.

Any way to specify that with the 'repeat' commands (which i don't know how to use) or something?

Report · Dec 15, 2022

Oh, and yet also NOT select the very FIRST instance of “

just the ones after it...

perhaps impossible?

Report · Dec 15, 2022

Maybe "lookahead" / "lookbehind" ? also something I don't know how to use...

Report · Dec 15, 2022

Here is some sample text to try and practice on. I want a GREP that will select all the “ EXCEPT the first one:

he went up on the mountain, and when he sat down, his disciples came to him. And he opened his mouth and taught them, saying: “Blessed are the poor in spirit, for theirs is the kingdom of heaven. “Blessed are those who mourn, for they shall be comforted. “Blessed are the meek, for they shall inherit the earth. “Blessed are those who hunger and thirst for righteousness, for they shall be satisfied. “Blessed are the merciful, for they shall receive mercy. “Blessed are the pure in heart, for they shall see God. “Blessed are the peacemakers, for they shall be called sons of God."

Report · Dec 16, 2022

The expression should in fact be “\K.+” (the question mark shouldn't be removed).

\K is a lookbehind. The expression matches, in a single paragraph, everything from (but not including) the first opening double quote to the last closing double quote (but maybe you can match until the end of the paragraph, makes no difference it seems to me).

Your sample, by the way, has a straight double quote at the end, which causes the expression not to match anything. Therefore maybe use “\K.+ (match from the first open quote to the end of the paragraph).

To make working with such multiple queries easier, save each query. Then you can use this script:

https://creativepro.com/files/kahrel/indesign/grep_query_runner.html

to chain the queries so that you can run them with a single click.

Report · Dec 16, 2022

**sorry, the straight-double-quote I just added for this post, to close the quote, since it was just a snippet of a full paragraph. my bad. sorry for the confusion. Thanks for helping me with this!!

Report · Dec 16, 2022

On reflection, if "match from the first double quote to the end of the paragraph" is adequate, then you can use "\K.+ -- that is, with a straight double quote, which matches both straight and curly quotes.

Report · Dec 16, 2022

Thank you so much, Peter, for your help, but alas the script still isn't quite what I am looking for, and, working through this I realize how I can better specify:

I have lots of paragraphs like this:

(EXAMPLE A):

He said “text text” and she said “text text” and so he said “text text”

as well as the problem ones I am trying to identify:

(EXAMPLE B):

He said “text text text. “Text text text. “text text.”

The script you gave me ALSO grabs the underlined from Example A:

He said “text text” and she said “text text” and so he said “text text”

But these I want to leave untouched.

I realize I didn't clarify this at all in the beginning.

So, is there some way to grab the underlined (from Example B):

He said “text text text. “Text text text. “text text.”

Without grabbing those ones from Example A?

Report · Dec 16, 2022

Aha, that's a bit different. This works: “\K[^”]+“

(Mind the quote types)

For the rest the same three-step approach.

Report · Dec 16, 2022

Ah, that is close, but still doesn't quite do it. I think I have around 400 paragraphs that need attention, and this script only finds four in my file, and two of them don't fit the bill seemingly at all.

Thanks anyways for your help.

If I could say in prose what I want the script to say it would be this:

Find any [“] that come after a [“] when there is no intervening [”] between them. The first instance of any [“] in a paragraph, leave that alone. If there are any [“] after any [”], leave those alone too. In other words, find any instances of [“ randomtext “ randomtext] and select only the middle “. Also, this should not be contingent on there being a ” at the end of a paragraph, because sometimes (by accident) those are also missing.

This may be beyond the scope of GREP....

Report · Dec 16, 2022

There might be a problem with quote types (curly vs. straight). Can you post a couple of example parahraphs where it goes wrong?

Report · Dec 16, 2022

What if you searched for quote marks and strip them out before you search for the paragraph returns?

Would it help if you searched for opening quotes as long as it (positive look-behind) matched the existence of period+space on the left of the quote mark, as in:

GREP search Find what:

(?<=\.\s)~{

Just guessing with you!

Mike Witherell

Report · Dec 16, 2022

I reached out on Reddit too, and this got me I think to the place I needed to go (reviewing text now to catch for bloopers)

https://www.reddit.com/r/indesign/comments/znfof5/need_help_with_a_specific_grep_script/

Thanks tho!

Report · Dec 17, 2022

Interesting appreoach. Let us know how you fare with it.