• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

GREP for Duplicate Lines (and then replacing it)

Explorer ,
Nov 27, 2021 Nov 27, 2021

Copy link to clipboard

Copied

Dear Community,

 

I think this is probably a job that GREP can take care of very quickly if I could just get help with the code to do it.

 

I have been attempting to at least figure out what GREP I need to just find the words, but I'm failing miserably, and I know there a bunch of experts on here. (?<=\t )(.+)\1 was my last attempt before I decided that trying to figure this out for an hour is ridiculous when someone on here will just simply know how to do it. I'm at the tail end of this 2 month long project with the deadline approaching quickly (Dec 3).

 

Possible Solution?

My thought is that there must be a GREP which could find a duplication of the words (the book title) perhaps with use of that constant factor of the tab on every line, then replace the duplicate word with: remove paragraph break, remove tab, add a comma. This would have the effect of bumping up the number on the duplicate line to the line above it, with the numbers being listed in increasing order and separated by commas. I just don't know how to write the GREP itself. The more I've thought about it and come up with that logic the more convinced I am that GREP can do it.

 

Example of Problem (words underlined are invisible characters):

199 Animals tab 6 paragraph break
199 Animals tab 24 paragraph break
199 Animals tab 178 paragraph break
Big Book of non-breaking space Dinosaurs tab 13 paragraph break
Big Book of non-breaking space Dinosaurs tab 59 paragraph break
Big Book of non-breaking space Dinosaurs tab 211 paragraph break
Bugs (Little Lift and Look) tab 9 paragraph break
Bugs (Little Lift and Look) tab 105 paragraph break

 

I need a GREP which will make these turn into:

199 Animals   6, 24, 178
Big Book of Dinosaurs   13, 59, 211
Bugs (Little Lift and Look)  9, 105

 

Thank you very much for taking the time to read my plea for help,

Rachael

TOPICS
How to

Views

1.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Guide , Nov 28, 2021 Nov 28, 2021

2 regex ("all-replace"), only 2 clicks:

 

1/ Find: (^.+\t)\K(.+)(?=\r\1)

Replace by: $0, 

 

2/ Find: , \K\r.+?\t

Replace by nothing

 

… So, yes! it can be simply and quickly done with simple Grep codes!

 

(^/)

 

 

Votes

Translate

Translate
Community Expert ,
Nov 27, 2021 Nov 27, 2021

Copy link to clipboard

Copied

It looks like your original text was created using the TOC style function. If so, have you thought about using the Index function?

David Creamer: Community Expert (ACI and ACE 1995-2023)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 27, 2021 Nov 27, 2021

Copy link to clipboard

Copied

Yes, when I first began I watched a bunch of videos on using the Index function, but it proved to be very cumbersome to try and use. It seemed to be more suitable for creating an index of a few specific keywords that might be found in a non-fiction book. I asked for advice on how to create an index for this catalogue project before I began and it was recommended to me that I should use the ToC function which could generate my index from a paragraph style. Perhaps this was the wrong advice, but it's the route I went with and I have now spent two months putting this catalogue together, including entering in several thousand book titles with the corrrect paragraph style for the index via the ToC function. Before printing I will make the book title paragraph style colour the text with no colour. The book titles will all magically disappear and each book will just have the ISBN, price, and author under it as my client wants. I have tested that this still works with the ToC function, it can still pull all the data even if the text is invisible. It is fabulous!

 

I need a fix for the problem at hand and just don't have time to create index entries for several thousand items. I'm convinced that a GREP find and replace will work. I've thought through the logic of it, but I am stuck on how to make the code I need, I lack that knowledge.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 27, 2021 Nov 27, 2021

Copy link to clipboard

Copied

I don't think it can be done by GREP

You'll probably need a script to do it.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Nov 28, 2021 Nov 28, 2021

Copy link to clipboard

Copied

Quickly and lazily written:

 

Find: (?-s)(^.+\t).+\K\r\1

Replace by: , 

 

… Click, Click, Click until 0 found!

 

(^/)  The Jedi

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 28, 2021 Nov 28, 2021

Copy link to clipboard

Copied

I tried that myself - it's not catching them all for me.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Nov 28, 2021 Nov 28, 2021

Copy link to clipboard

Copied

Just 5 F/R Clicks that just takes 10 seconds! …

[I don't take the 5 alert messages in account!  =D]

 

(^/)

 

Capture d’écran 2021-11-28 à 13.24.54.pngCapture d’écran 2021-11-28 à 13.25.24.pngCapture d’écran 2021-11-28 à 13.25.46.pngCapture d’écran 2021-11-28 à 13.26.07.pngCapture d’écran 2021-11-28 à 13.26.39.pngCapture d’écran 2021-11-28 à 13.26.50.png

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 28, 2021 Nov 28, 2021

Copy link to clipboard

Copied

Seems to work fine. 

Thanks for showing it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Nov 28, 2021 Nov 28, 2021

Copy link to clipboard

Copied

2 regex ("all-replace"), only 2 clicks:

 

1/ Find: (^.+\t)\K(.+)(?=\r\1)

Replace by: $0, 

 

2/ Find: , \K\r.+?\t

Replace by nothing

 

… So, yes! it can be simply and quickly done with simple Grep codes!

 

(^/)

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 28, 2021 Nov 28, 2021

Copy link to clipboard

Copied

Thanks

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

Hi FRIdNGE!

 

Thank you so much for taking the time to figure this out.

 

I regretfully write back to say that it isn't working for me. The first find and replace I can see it is doing something to find a duplicate title, and then adding a comma to the end of the line after the page number. Preparing for the next action.

However, the second time I go to find:
, \K\r.+?\t
I get a message "Cannot find match."

 

Sorry for the late reply, I've been working long days to push to get this project done, and I didn't realise that there some something messed up with my email so I didn't get notifications through about all the kind people who have responded to my post. I would dearly like this to work!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

I began to try and troubleshoot, I don't know what \K looks for so I tried that first. It can not find \K

I then tried some of the bits I recognise that follow the K:    \r.+?\t    and it could find that just fine. So my best guess is that the \K is giving me issues. But a quick google tells me that \K is "a variable-length lookbehind" so I think that means that must be the bit which is finding the duplicate titles.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

What InDesign version do you use?

 

(^/)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

Not sure how to look up the precise version. I'm subbed to Adobe CC.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

Hmm, okay. So messing around with this a bit.

There are no commas involved with the page numbers until after the first magical find and replace has been done. First round of Find: (^.+\t)\K(.+)(?=\r\1) Change to: $0,

 

For the second bought I think and hope that this much simpler GREP I've just figured out through trial and error doesn't have any gaping flaws in it?

Find:
,\r\s
Change to:
,

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

No, never mind me. I got confused with the test text in front of me. I figured that out based on a piece of sample text which isn't even what gets produced. But I think it will still work. Need to figure out how to get it to pick up the title on the next line and the tab etc.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

Does the first F/R work for you? (add the comma + space)

 

(^/)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

Find:
(^.+\t)\K(.+)(?=\r\1)
Replace:
$0,

 

This works for me yes. It finds a duplicate title and adds a comma to the end of the line. It's beautiful!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

Ahhh hah!

 

I'm so sorry, I feel very stupid. There was a space at the end of the first replace $0,space which being an invisible character meant I didn't realise. But once you asked that question I twigged that the second find specifically looks for a comma and a space, therefore the first replace needed to be creating that same thing in order for the second find to work.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

Did you include the "space" after "$0," in the Replace of the first F/R?

 

(^/)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

Thank you, thank you, thank you!

I don't know how many hours you've just saved me, but let's just say it's a LOT.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Dec 01, 2021 Dec 01, 2021

Copy link to clipboard

Copied

You're welcome! [It's simple for me.]

 

(^/)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 02, 2021 Dec 02, 2021

Copy link to clipboard

Copied

LATEST

Great work - I'm surprised it didn't work for me - but it's working now. Glitch in the matrix.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines