Skip to main content
rachaelc29879128
Inspiring
November 28, 2021
Answered

GREP for Duplicate Lines (and then replacing it)

  • November 28, 2021
  • 2 replies
  • 2623 views

Dear Community,

 

I think this is probably a job that GREP can take care of very quickly if I could just get help with the code to do it.

 

I have been attempting to at least figure out what GREP I need to just find the words, but I'm failing miserably, and I know there a bunch of experts on here. (?<=\t )(.+)\1 was my last attempt before I decided that trying to figure this out for an hour is ridiculous when someone on here will just simply know how to do it. I'm at the tail end of this 2 month long project with the deadline approaching quickly (Dec 3).

 

Possible Solution?

My thought is that there must be a GREP which could find a duplication of the words (the book title) perhaps with use of that constant factor of the tab on every line, then replace the duplicate word with: remove paragraph break, remove tab, add a comma. This would have the effect of bumping up the number on the duplicate line to the line above it, with the numbers being listed in increasing order and separated by commas. I just don't know how to write the GREP itself. The more I've thought about it and come up with that logic the more convinced I am that GREP can do it.

 

Example of Problem (words underlined are invisible characters):

199 Animals tab 6 paragraph break
199 Animals tab 24 paragraph break
199 Animals tab 178 paragraph break
Big Book of non-breaking space Dinosaurs tab 13 paragraph break
Big Book of non-breaking space Dinosaurs tab 59 paragraph break
Big Book of non-breaking space Dinosaurs tab 211 paragraph break
Bugs (Little Lift and Look) tab 9 paragraph break
Bugs (Little Lift and Look) tab 105 paragraph break

 

I need a GREP which will make these turn into:

199 Animals   6, 24, 178
Big Book of Dinosaurs   13, 59, 211
Bugs (Little Lift and Look)  9, 105

 

Thank you very much for taking the time to read my plea for help,

Rachael

This topic has been closed for replies.
Correct answer FRIdNGE

Seems to work fine. 

Thanks for showing it.


2 regex ("all-replace"), only 2 clicks:

 

1/ Find: (^.+\t)\K(.+)(?=\r\1)

Replace by: $0, 

 

2/ Find: , \K\r.+?\t

Replace by nothing

 

… So, yes! it can be simply and quickly done with simple Grep codes!

 

(^/)

 

 

2 replies

FRIdNGE
Inspiring
November 28, 2021

Quickly and lazily written:

 

Find: (?-s)(^.+\t).+\K\r\1

Replace by: , 

 

… Click, Click, Click until 0 found!

 

(^/)  The Jedi

Community Expert
November 28, 2021

I tried that myself - it's not catching them all for me.

 

FRIdNGE
Inspiring
November 28, 2021

Just 5 F/R Clicks that just takes 10 seconds! …

[I don't take the 5 alert messages in account!  =D]

 

(^/)

 

 

 

Dave Creamer of IDEAS
Community Expert
November 28, 2021

It looks like your original text was created using the TOC style function. If so, have you thought about using the Index function?

David Creamer: Community Expert (ACI and ACE 1995-2023)
rachaelc29879128
Inspiring
November 28, 2021

Yes, when I first began I watched a bunch of videos on using the Index function, but it proved to be very cumbersome to try and use. It seemed to be more suitable for creating an index of a few specific keywords that might be found in a non-fiction book. I asked for advice on how to create an index for this catalogue project before I began and it was recommended to me that I should use the ToC function which could generate my index from a paragraph style. Perhaps this was the wrong advice, but it's the route I went with and I have now spent two months putting this catalogue together, including entering in several thousand book titles with the corrrect paragraph style for the index via the ToC function. Before printing I will make the book title paragraph style colour the text with no colour. The book titles will all magically disappear and each book will just have the ISBN, price, and author under it as my client wants. I have tested that this still works with the ToC function, it can still pull all the data even if the text is invisible. It is fabulous!

 

I need a fix for the problem at hand and just don't have time to create index entries for several thousand items. I'm convinced that a GREP find and replace will work. I've thought through the logic of it, but I am stuck on how to make the code I need, I lack that knowledge.

Community Expert
November 28, 2021

I don't think it can be done by GREP

You'll probably need a script to do it.