Simple GREP deletion of lines

Report · Jun 30, 2022

I'd need a simple find and delete duplicate lines for a 16K items column.

VARIABLE NUMBERS(\t)PAGE NUMBER\p)

Thank you for help

Report · Jun 30, 2022

Hi @Gioyer07:

GREP is pattern-based. It looks like your pattern is:

a series of numbers, a tab, more numbers and a hard return.

If these are the only paragraphs in the file that follow that pattern, use:

Find what: \d+\t\d+\r

Change to:

Save the file first!

~Barb

Report · Jul 01, 2022

Thank you Barb

My attempts to use your syntax failed. It probably is my fault because I failed to describe better the text I need to clean.

It has this structure:

011101850 17
011101850 17
011102250 17

The same three lines look in the finder as in the picture:

I really appreciate your help. Thank you.

Report · Jul 01, 2022

Thank you Barb

My attempts to use your syntax didn't work and it's due to my poor description.

The picture describes better how the lines are made

As you notice a "y" is used instead of "t"... but it's not working either. Thank you for your help.

Report · Jul 01, 2022

Hi @Gioyer07,

To understand the issue properly, are the duplicate lines always adjacent lines and we need to detect the duplicates and keep only a single instance?

-Manan

Report · Jul 05, 2022

Excuse for the delay. No lines are not always adjacent. Thank you.

Report · Jul 02, 2022

To remove one following duplicate you can try something like that:

(\d+~y\d+\r)\K\1

if numbers tab numbers end of paragraph

If that does not work - please show a screenshot with visible hidden characters.

Report · Jul 05, 2022

Thank You very much for the answer. Excuse the delay I had the office computer off. The screenshot precisely shows the hidden characters in the white box. Your solution seems to be right, please allow some time to duoblecheck it. Thank You

Report · Jul 05, 2022

… The screenshot precisely shows the hidden characters in the white box…

By @Gioyer07

Are you sure? Which screenshot do you mean?

Both screenshots of you show guides and layout lines - but the hidden characters are not visible.

That mean: menu: Type --> Show hidden characters

[Ctrl/Strg]+[Alt]+[I]

Report · Jul 05, 2022

Thank you... well if you look the guides and layout lines you certainly see them.... but in the find/replace window you can clearly see the hidden characters... perhaps you need to click it and see it full screen if you can't see it in the chat

Report · Jul 05, 2022

No.

That is not what I mean!

Go to menu: Type --> Show hidden characters (or similar)

Then the hidden characters will shown as blue characters/signs.

and create a new screenshot for us.

Report · Jul 06, 2022

Thank you for patience, here is the correct screenshot.

I applyed

(\d+~y\d+\r)\K\1

in the Find line and nothing in the Replace line.

Pressing 'Replace All' gave one substitution (while there are hundreds of exactly equal lines), pressing again 'Replace All' produced zero other findings/deletion.

Report · Jul 06, 2022

Report · Jul 06, 2022

Strange that the Grep finds anything at all. There are spaces between the numbers and the tab, which you did not mention before.

Report · Jul 06, 2022

Hi together,

I would try this one:

Find GREP:

(\d{7}\h{2}~y\d{3}\r)\1

Change GREP:

$1

Do that two or more times until nothing is found.

And let's hope that the pattern for all paragraphs* is:

A range of 7 digits followed by two horizontal white spaces followed by a right alligned tab followed by a range of 3 digits followed by a end of paragraph special character.

[*] I can see from the screenshot that we perhaps also have to deal with anchored objects.

Regards,
Uwe Laubender
( Adobe Community Professional )

Report · Jul 06, 2022

Hm.

Perhaps its better as a first step to get rid of possible white space characters between the first range of digits and the right align tab and then tackle the issue with duplicate paragraphs. I'm not sure if the number of white spaces after the first range of digits is always the same. ( Just a guess of course. )

Regards,
Uwe Laubender
( Adobe Community Professional )

Report · Jul 06, 2022

Hallo Uwe

Leider nein. So einfach ist es nicht. Sieh dir bitte noch einmal die ersten Screenshots an.

Ziffern (variabel zwischen 7-10)
Buchstabe (eventuell 1x)
Leerzeichen (eventuell mehrere)
verankertes Objekt (eventuell, Position unbekannt)
Tabulator (Zeilenspalter)
Ziffern (variabel Anzahl vielleicht bis 4 ??)
Absatzende

Report · Jul 06, 2022

Thank you, @pixxxelschubser !

You are right! Then we have to talk about "sameness" and the flexible patterns with \K. And what that means for atomic groups. Or we have to take a shortcut and write a script. 🙂

Let's see what our OP thinks about all this…

Thanks,
Uwe Laubender
( Adobe Community Professional )

Report · Jul 08, 2022

This is beyond my skills... The only thing I am sure of is that I failed also the title, because it's not "Simple" indeed... Thank you to the experts that replies or propose usable solutions. At the moment I just can't say that it is "solved".

Report · Jul 12, 2022

Excuse me if I insist... Would it be possible to tell GREP to ignore ANY object before checking for duplicates?

Would this reduce the complexity of the task? Thank you

Report · Jul 08, 2022

Hi @Gioyer07 ,

usually the issue with duplicate paragraphs should be solved with the data source.

So that duplicate paragraphs will not show up in InDesign after placing the content.

Regards,
Uwe Laubender
( Adobe Community Professional )

Report · Jul 11, 2022

Thank you Laubender

That would be a solution if only backend data would know the page number of all the codes. Happens that knows not and never will but they are needed for printing.

Report · Jul 12, 2022

Please upload an 1 page example IDML (with no confidential data, but enough samples) on a hoster of your choise and link here.

Report · Jul 18, 2022

I uploaded it here using Wetransfer. The link is usually available only a week and I wonder if the 4 Mb document would be more useful to be uploaded somewhere else, with no expiry date for other potentially interested users. Thank You so much.

Report · Mar 29, 2023

Solution (for newbies like me) has to follow these steps

1) clean the document from spaces, tabs etc. (use show hidden characters).

2) apply grep

find

^(.+\r)\1+
replace
$1

Adobe Community

Simple GREP deletion of lines

1 Correct answer