Skip to main content
Participating Frequently
June 24, 2022
Answered

GREP: Find consecutive sentences that begin with the same word

  • June 24, 2022
  • 6 replies
  • 1669 views

 

Please, can you tell me, how to find consecutive sentences that begin with the same word in Adobe Indesign by using GREP search?

 

Let's say that we work on a biography book: "Elvis was a famous musician. Elvis was born in..." and we need to change this to: "Elvis was a famous musician. He was born in...". I can make this edit manualy, I just need to locate such situations, but I don't know how.

 

Thanks

This topic has been closed for replies.
Correct answer pixxxelschubser

Another try:

 

(?<!\h)(\u[\l\u]+?)\h[^!?.]+?[!?.]\h\1

 

 

The grep could still be simplified if necessary.

6 replies

pixxxelschubser
Community Expert
Community Expert
June 29, 2022

@Sotir25004881oiei 

Have you tried it?

Participating Frequently
June 30, 2022
quote

@Sotir25004881oiei 

Have you tried it?


By @pixxxelschubser

 

Yes, thank you so much! I marked it as a correct answer!

 

I was very busy these days, so I'm sorry that I was absent from the forum, that's why I am late with my reply. This is fantastic, thank you so much

Joel Cherney
Community Expert
Community Expert
July 2, 2022

I love being proven wrong when I say something foolish like "I don't think you can do that with GREP in InDesign." 🙂

pixxxelschubser
Community Expert
pixxxelschubserCommunity ExpertCorrect answer
Community Expert
June 27, 2022

Another try:

 

(?<!\h)(\u[\l\u]+?)\h[^!?.]+?[!?.]\h\1

 

 

The grep could still be simplified if necessary.

pixxxelschubser
Community Expert
Community Expert
June 27, 2022

It may be a bit silly, but you could try:

(find from - to)

(\u[\l\u]+?)\h[^!?.]+?[!?.]\h\1

or find only consecutive (second) instance

(\u[\l\u]+?)\h[^!?.]+?[!?.]\h\K\1

 

Peter Kahrel
Community Expert
Community Expert
June 27, 2022

Ah -- I lost track of 'consecutive' 🙂

pixxxelschubser
Community Expert
Community Expert
June 27, 2022

@Sotir25004881oiei 

Please write some real text that does not work as desired in Cyrillic here in the forum.

Participating Frequently
June 27, 2022

@pixxxelschubser , thanks for your reply and an update:

 

This code by @FRIdNGE actually works with cyrillic, I'm sorry I was wrong. But it works only in simple examples such as this (cyrillic script, you can copy/paste in your Indesign and search within):

 

"Физиката е естествена наука, която изучава общите и фундаментални закономерности, които определят изграждането и еволюцията на материалния свят. Физиката е точна наука, което означава, че се занимава с количественото описание на природните явления.".

 

But here's a more complicated example. You'll notice that the search will find the same word as the first one in the paragraph, but the duplicate is not located in the next sentence, so this is not what we want:


"Математиците търсят определени „образци на шаблони“, за да формулират нови теореми, аксиоми и типове математически доказателства. Когато откритите и изучени математически структури са базирани на добри (идеални, репетативни или количествено обозрими) модели, може да се използва математическо доказателство при създаването на научни прогнози и предвиждания за определени теми, области или обекти. Интересни дискусии и аргументи първо се появяват в древногръцката математика (а преди това, тоест още от предисторията се използва за изчисляване, измерване и за изучаване на формите и движенията на физическите обекти чрез дедуктивни разсъждения и абстракции), а по-късно математиката се развива в доста сложна и многостранна наука за абстрактни количествени и качествени връзки, форми и структури, с нейните аксиоматични системи от късния 19 век, като вече се приема за обичайно да се разглеждат математическите изследвания като установяване на математическата и научна истината чрез строги дедукции с използване на избрани аксиоми, научни дефиниции и определения. Математиката е от съществено значение в много области, включително естествени науки, инженерство, медицина, финанси и социални науки. Приложна математика доведе до изцяло нови математически дисциплини, като статистика и теория на игрите. Математиците се занимават с чиста математика (математика заради себе си), без да имат предвид каквото и да било приложение, но практическите приложения за това, което започна като чиста математика, често се откриват по-късно".

 

Conclusion: That's why when I try to search in the actual book, which is quite big, it catches some words that seem just random to me. Maybe they're identified as duplicates of something that appeared long before. Maybe several sentences before. And then, after a long search (find next, find next...), it actually finds the things that I'm really looking for, but this is too complicated. This code is not bad, it partially works, but it needs a little bit of improvement.

Peter Kahrel
Community Expert
Community Expert
June 27, 2022

That's probably because i the first intance the quotation mark is included.

FRIdNGE
June 26, 2022

Basically (to be more thought but at the beach at the moment):

 

(\u\H+).+?\.\h\K\1

 

(^/)  The Jedi

Participating Frequently
June 26, 2022

@Joel Cherney, thanks for your reply.

 

I'm not an expert in these things, so I apologize that that I don't understand everything you said. My english is not very good, too.

 

The simple reason why I need such a GREP code (if it can be arranged) is that it is kind of dull or boring when sentences located next to each other begin with the same word (unless it's poetry or a speech, where such repetition can be useful for rhetorical or poetic effect).

 

For example, imagine a biographical book about a musician or an actor, where every sentence begins with his or her name. That would be dull. That's why I wrote that simple example about Elvis, but it can be Lennon or anyone else.

 

What I need is a GREP code like this:

any word - some text in between - a punctuation that closes a sentence (a period, an exclamation or smth) - that word again (the one from the very beginning)

 

It is possible with GREP to locate duplicated words written by mistake next to eachother (which often happens), but I don't know how to modify that GREP code to suit my needs (if that's possible).

 

The code: \b(\w+)\b \1

 

@FRIdNGE, thank you for your reply. Unfortunately your code didn't work for me, but I appreciate it.

 

Peter Kahrel
Community Expert
Community Expert
June 27, 2022

@Sotir25004881oiei  Michel's code does work, though only with periods as sentence-final markers.

Joel Cherney
Community Expert
Community Expert
June 25, 2022

I don't think you can do that with GREP, actually. I think you'd need to parse the whole document. I can imagine doing it in Javascript, but not in a single GREP query. Is there a reason that you need to use regular expressions in particular? Or do you just need a tool, any tool that's not your eyeballs, to automate this editorial preference?