Participating Frequently

Answered

GREP: Find consecutive sentences that begin with the same word

Forum|Forum|3 years ago
June 24, 2022
6 replies
1669 views

Please, can you tell me, how to find consecutive sentences that begin with the same word in Adobe Indesign by using GREP search?

Let's say that we work on a biography book: "Elvis was a famous musician. Elvis was born in..." and we need to change this to: "Elvis was a famous musician. He was born in...". I can make this edit manualy, I just need to locate such situations, but I don't know how.

Thanks

This topic has been closed for replies.

Correct answer pixxxelschubser

Another try:

(?<!\h)(\u[\l\u]+?)\h[^!?.]+?[!?.]\h\1

The grep could still be simplified if necessary.

pixxxelschubser

Community Expert

@Sotir25004881oiei

Have you tried it?

S

Sotir25004881oieiAuthor

Participating Frequently

@Sotir25004881oiei
Have you tried it?

By @pixxxelschubser

Yes, thank you so much! I marked it as a correct answer!

I was very busy these days, so I'm sorry that I was absent from the forum, that's why I am late with my reply. This is fantastic, thank you so much

Joel Cherney

Community Expert

I love being proven wrong when I say something foolish like "I don't think you can do that with GREP in InDesign." 🙂

pixxxelschubser

Correct answer

Community Expert

Another try:

(?<!\h)(\u[\l\u]+?)\h[^!?.]+?[!?.]\h\1

The grep could still be simplified if necessary.

pixxxelschubser

Community Expert

It may be a bit silly, but you could try:

(find from - to)

(\u[\l\u]+?)\h[^!?.]+?[!?.]\h\1

or find only consecutive (second) instance

(\u[\l\u]+?)\h[^!?.]+?[!?.]\h\K\1

Peter Kahrel

Community Expert

Ah -- I lost track of 'consecutive' 🙂

pixxxelschubser

Community Expert

@Sotir25004881oiei

Please write some real text that does not work as desired in Cyrillic here in the forum.

S

Sotir25004881oieiAuthor

Participating Frequently

@pixxxelschubser , thanks for your reply and an update:

This code by @FRIdNGE actually works with cyrillic, I'm sorry I was wrong. But it works only in simple examples such as this (cyrillic script, you can copy/paste in your Indesign and search within):

"Физиката е естествена наука, която изучава общите и фундаментални закономерности, които определят изграждането и еволюцията на материалния свят. Физиката е точна наука, което означава, че се занимава с количественото описание на природните явления.".

But here's a more complicated example. You'll notice that the search will find the same word as the first one in the paragraph, but the duplicate is not located in the next sentence, so this is not what we want:

"Математиците търсят определени „образци на шаблони“, за да формулират нови теореми, аксиоми и типове математически доказателства. Когато откритите и изучени математически структури са базирани на добри (идеални, репетативни или количествено обозрими) модели, може да се използва математическо доказателство при създаването на научни прогнози и предвиждания за определени теми, области или обекти. Интересни дискусии и аргументи първо се появяват в древногръцката математика (а преди това, тоест още от предисторията се използва за изчисляване, измерване и за изучаване на формите и движенията на физическите обекти чрез дедуктивни разсъждения и абстракции), а по-късно математиката се развива в доста сложна и многостранна наука за абстрактни количествени и качествени връзки, форми и структури, с нейните аксиоматични системи от късния 19 век, като вече се приема за обичайно да се разглеждат математическите изследвания като установяване на математическата и научна истината чрез строги дедукции с използване на избрани аксиоми, научни дефиниции и определения. Математиката е от съществено значение в много области, включително естествени науки, инженерство, медицина, финанси и социални науки. Приложна математика доведе до изцяло нови математически дисциплини, като статистика и теория на игрите. Математиците се занимават с чиста математика (математика заради себе си), без да имат предвид каквото и да било приложение, но практическите приложения за това, което започна като чиста математика, често се откриват по-късно".

Conclusion: That's why when I try to search in the actual book, which is quite big, it catches some words that seem just random to me. Maybe they're identified as duplicates of something that appeared long before. Maybe several sentences before. And then, after a long search (find next, find next...), it actually finds the things that I'm really looking for, but this is too complicated. This code is not bad, it partially works, but it needs a little bit of improvement.

Peter Kahrel

Community Expert

That's probably because i the first intance the quotation mark is included.

FRIdNGE

Basically (to be more thought but at the beach at the moment):

(\u\H+).+?\.\h\K\1

(^/) The Jedi

S

Sotir25004881oieiAuthor

Participating Frequently

@Joel Cherney, thanks for your reply.

I'm not an expert in these things, so I apologize that that I don't understand everything you said. My english is not very good, too.

The simple reason why I need such a GREP code (if it can be arranged) is that it is kind of dull or boring when sentences located next to each other begin with the same word (unless it's poetry or a speech, where such repetition can be useful for rhetorical or poetic effect).

For example, imagine a biographical book about a musician or an actor, where every sentence begins with his or her name. That would be dull. That's why I wrote that simple example about Elvis, but it can be Lennon or anyone else.

What I need is a GREP code like this:

any word - some text in between - a punctuation that closes a sentence (a period, an exclamation or smth) - that word again (the one from the very beginning)

It is possible with GREP to locate duplicated words written by mistake next to eachother (which often happens), but I don't know how to modify that GREP code to suit my needs (if that's possible).

The code: \b(\w+)\b \1

@FRIdNGE, thank you for your reply. Unfortunately your code didn't work for me, but I appreciate it.

Peter Kahrel

Community Expert

@Sotir25004881oiei Michel's code does work, though only with periods as sentence-final markers.

Joel Cherney

Community Expert

I don't think you can do that with GREP, actually. I think you'd need to parse the whole document. I can imagine doing it in Javascript, but not in a single GREP query. Is there a reason that you need to use regular expressions in particular? Or do you just need a tool, any tool that's not your eyeballs, to automate this editorial preference?

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded