GREP for Duplicate Lines (and then replacing it)

Report · Nov 27, 2021

Dear Community,

I think this is probably a job that GREP can take care of very quickly if I could just get help with the code to do it.

I have been attempting to at least figure out what GREP I need to just find the words, but I'm failing miserably, and I know there a bunch of experts on here. (?<=\t )(.+)\1 was my last attempt before I decided that trying to figure this out for an hour is ridiculous when someone on here will just simply know how to do it. I'm at the tail end of this 2 month long project with the deadline approaching quickly (Dec 3).

Possible Solution?

My thought is that there must be a GREP which could find a duplication of the words (the book title) perhaps with use of that constant factor of the tab on every line, then replace the duplicate word with: remove paragraph break, remove tab, add a comma. This would have the effect of bumping up the number on the duplicate line to the line above it, with the numbers being listed in increasing order and separated by commas. I just don't know how to write the GREP itself. The more I've thought about it and come up with that logic the more convinced I am that GREP can do it.

Example of Problem (words underlined are invisible characters):

199 Animals tab 6 paragraph break
199 Animals tab 24 paragraph break
199 Animals tab 178 paragraph break
Big Book of non-breaking space Dinosaurs tab 13 paragraph break
Big Book of non-breaking space Dinosaurs tab 59 paragraph break
Big Book of non-breaking space Dinosaurs tab 211 paragraph break
Bugs (Little Lift and Look) tab 9 paragraph break
Bugs (Little Lift and Look) tab 105 paragraph break

I need a GREP which will make these turn into:

199 Animals 6, 24, 178
Big Book of Dinosaurs 13, 59, 211
Bugs (Little Lift and Look) 9, 105

Thank you very much for taking the time to read my plea for help,

Rachael

Report · Nov 27, 2021

It looks like your original text was created using the TOC style function. If so, have you thought about using the Index function?

David Creamer: Community Expert (ACI and ACE 1995-2023)

Report · Nov 27, 2021

Yes, when I first began I watched a bunch of videos on using the Index function, but it proved to be very cumbersome to try and use. It seemed to be more suitable for creating an index of a few specific keywords that might be found in a non-fiction book. I asked for advice on how to create an index for this catalogue project before I began and it was recommended to me that I should use the ToC function which could generate my index from a paragraph style. Perhaps this was the wrong advice, but it's the route I went with and I have now spent two months putting this catalogue together, including entering in several thousand book titles with the corrrect paragraph style for the index via the ToC function. Before printing I will make the book title paragraph style colour the text with no colour. The book titles will all magically disappear and each book will just have the ISBN, price, and author under it as my client wants. I have tested that this still works with the ToC function, it can still pull all the data even if the text is invisible. It is fabulous!

I need a fix for the problem at hand and just don't have time to create index entries for several thousand items. I'm convinced that a GREP find and replace will work. I've thought through the logic of it, but I am stuck on how to make the code I need, I lack that knowledge.

Report · Nov 27, 2021

I don't think it can be done by GREP

You'll probably need a script to do it.

Report · Nov 28, 2021

Quickly and lazily written:

Find: (?-s)(^.+\t).+\K\r\1

Replace by: ,

… Click, Click, Click until 0 found!

(^/) The Jedi

Report · Nov 28, 2021

I tried that myself - it's not catching them all for me.

Report · Nov 28, 2021

Just 5 F/R Clicks that just takes 10 seconds! …

[I don't take the 5 alert messages in account! =D]

(^/)

Report · Nov 28, 2021

Seems to work fine.

Thanks for showing it.

Report · Nov 28, 2021

2 regex ("all-replace"), only 2 clicks:

1/ Find: (^.+\t)\K(.+)(?=\r\1)

Replace by: $0,

2/ Find: , \K\r.+?\t

Replace by nothing

… So, yes! it can be simply and quickly done with simple Grep codes!

(^/)

Report · Nov 28, 2021

Thanks

Report · Dec 01, 2021

Hi FRIdNGE!

Thank you so much for taking the time to figure this out.

I regretfully write back to say that it isn't working for me. The first find and replace I can see it is doing something to find a duplicate title, and then adding a comma to the end of the line after the page number. Preparing for the next action.

However, the second time I go to find:
, \K\r.+?\t
I get a message "Cannot find match."

Sorry for the late reply, I've been working long days to push to get this project done, and I didn't realise that there some something messed up with my email so I didn't get notifications through about all the kind people who have responded to my post. I would dearly like this to work!

Report · Dec 01, 2021

I began to try and troubleshoot, I don't know what \K looks for so I tried that first. It can not find \K

I then tried some of the bits I recognise that follow the K: \r.+?\t and it could find that just fine. So my best guess is that the \K is giving me issues. But a quick google tells me that \K is "a variable-length lookbehind" so I think that means that must be the bit which is finding the duplicate titles.

Report · Dec 01, 2021

What InDesign version do you use?

(^/)

Report · Dec 01, 2021

Not sure how to look up the precise version. I'm subbed to Adobe CC.

Report · Dec 01, 2021

Hmm, okay. So messing around with this a bit.

There are no commas involved with the page numbers until after the first magical find and replace has been done. First round of Find: (^.+\t)\K(.+)(?=\r\1) Change to: $0,

For the second bought I think and hope that this much simpler GREP I've just figured out through trial and error doesn't have any gaping flaws in it?

Find:
,\r\s
Change to:
,

Report · Dec 01, 2021

No, never mind me. I got confused with the test text in front of me. I figured that out based on a piece of sample text which isn't even what gets produced. But I think it will still work. Need to figure out how to get it to pick up the title on the next line and the tab etc.

Report · Dec 01, 2021

Does the first F/R work for you? (add the comma + space)

(^/)

Report · Dec 01, 2021

Find:
(^.+\t)\K(.+)(?=\r\1)
Replace:
$0,

This works for me yes. It finds a duplicate title and adds a comma to the end of the line. It's beautiful!

Report · Dec 01, 2021

Ahhh hah!

I'm so sorry, I feel very stupid. There was a space at the end of the first replace $0,space which being an invisible character meant I didn't realise. But once you asked that question I twigged that the second find specifically looks for a comma and a space, therefore the first replace needed to be creating that same thing in order for the second find to work.

Report · Dec 01, 2021

Did you include the "space" after "$0," in the Replace of the first F/R?

(^/)

Report · Dec 01, 2021

Thank you, thank you, thank you!

I don't know how many hours you've just saved me, but let's just say it's a LOT.

Report · Dec 01, 2021

You're welcome! [It's simple for me.]

(^/)

Report · Dec 02, 2021

Great work - I'm surprised it didn't work for me - but it's working now. Glitch in the matrix.

GREP for Duplicate Lines (and then replacing it)

1 Correct answer