Skip to main content
Inspiring
January 18, 2019
Answered

Help with GREP code to find duplicate text

  • January 18, 2019
  • 1 reply
  • 4573 views

Hello:

I have a issue with movies times (something Google must have added to their codes) because when I copy/paste movie times from Google to format for newspapers, it added a line that read: Showtimes for (then the name of the movie) and there is a duplicate of the title next to this line.

If you search for movie times in your area and copy/paste into InDesign, you'll see what I mean.

I have a script that formats the titles and times, but now it still has the "Showtimes for....stuff in the text.

And I'm trying, unsuccessfully, to delete to "Showtimes for and one of the duplicate titles.

So far, I have tried this:

(\b\w+)( \1)+\b

It will find duplicates, but only one-word duplicates.

I've got this too.  But it finds everything in between and I can't figure out how to isolate the text that I want to delete.

(?<=Showtimes for)(.+?)(?=\d\d+)

I need to search for entire titles that have different words and word counts.

Here is a sample:

Showtimes for A Dog’s Way Home A Dog’s Way Home 12:35pm 5:00pm 7:45pm Showtimes for Mary Poppins Returns Mary Poppins Returns 12:35pm 3:35pm 6:45pm Showtimes for Bumblebee Bumblebee 1:15pm 4:20pm 7:20pm Showtimes for Spider-Man: Into the Spider-Verse Spider-Man: Into the Spider-Verse 12:40pm 4:15pm 7:10pm

Thanks for any help.

Sorry for the long text.

Larry

    This topic has been closed for replies.
    Correct answer vinny38

    Hi

    Looks like a great job for using back-references (\n)

    Try this:

    Find (?<=Showtimes for )(.+)(\1)

    Replace by $1

    Mind the space after "for".

    Regards

    Vinny

    1 reply

    vinny38
    vinny38Correct answer
    Legend
    January 18, 2019

    Hi

    Looks like a great job for using back-references (\n)

    Try this:

    Find (?<=Showtimes for )(.+)(\1)

    Replace by $1

    Mind the space after "for".

    Regards

    Vinny

    Inspiring
    January 18, 2019

    Hi, Vinny:

    I can't get this (?<=Showtimes for )(.+)(\1) to work....

    Tried it several different ways, with changes the "space" around, but i always get that "Can not find match message.

    How can you I use back-references (\n)

    Unfamiliar with that....

    Thanks for your help..

    Larry

    winterm
    Legend
    January 18, 2019

    For single word duplicates like Bumblebee Bumblebee you could search for (\w+ )\1, and change to $1.

    However, I have a hard time to imagine how this could be implemented with such a complex sequences like A Dog’s Way Home, Mary Poppins Returns, or Spider-Man: Into the Spider-Verse.

    How are you going to define this? Make a list of all existing films?