Copy link to clipboard
Copied
I'd need a simple find and delete duplicate lines for a 16K items column.
VARIABLE NUMBERS(\t)PAGE NUMBER\p)
Thank you for help
Solution (for newbies like me) has to follow these steps
1) clean the document from spaces, tabs etc. (use show hidden characters).
2) apply grep
find
^(.+\r)\1+
replace
$1
Copy link to clipboard
Copied
Hi @Gioyer07:
GREP is pattern-based. It looks like your pattern is:
a series of numbers, a tab, more numbers and a hard return.
If these are the only paragraphs in the file that follow that pattern, use:
Find what: \d+\t\d+\r
Change to:
Save the file first!
~Barb
Copy link to clipboard
Copied
Thank you Barb
My attempts to use your syntax failed. It probably is my fault because I failed to describe better the text I need to clean.
It has this structure:
011101850 17
011101850 17
011102250 17
The same three lines look in the finder as in the picture:
I really appreciate your help. Thank you.
Copy link to clipboard
Copied
Thank you Barb
My attempts to use your syntax didn't work and it's due to my poor description.
The picture describes better how the lines are made
As you notice a "y" is used instead of "t"... but it's not working either. Thank you for your help.
Copy link to clipboard
Copied
Hi @Gioyer07,
To understand the issue properly, are the duplicate lines always adjacent lines and we need to detect the duplicates and keep only a single instance?
-Manan
Copy link to clipboard
Copied
Excuse for the delay. No lines are not always adjacent. Thank you.
Copy link to clipboard
Copied
To remove one following duplicate you can try something like that:
(\d+~y\d+\r)\K\1
if numbers tab numbers end of paragraph
If that does not work - please show a screenshot with visible hidden characters.
Copy link to clipboard
Copied
Thank You very much for the answer. Excuse the delay I had the office computer off. The screenshot precisely shows the hidden characters in the white box. Your solution seems to be right, please allow some time to duoblecheck it. Thank You
Copy link to clipboard
Copied
… The screenshot precisely shows the hidden characters in the white box…
By @Gioyer07
Are you sure? Which screenshot do you mean?
Both screenshots of you show guides and layout lines - but the hidden characters are not visible.
That mean: menu: Type --> Show hidden characters
[Ctrl/Strg]+[Alt]+[I]
Copy link to clipboard
Copied
Thank you... well if you look the guides and layout lines you certainly see them.... but in the find/replace window you can clearly see the hidden characters... perhaps you need to click it and see it full screen if you can't see it in the chat
Copy link to clipboard
Copied
No.
That is not what I mean!
Go to menu: Type --> Show hidden characters (or similar)
Then the hidden characters will shown as blue characters/signs.
and create a new screenshot for us.
Copy link to clipboard
Copied
Thank you for patience, here is the correct screenshot.
I applyed
(\d+~y\d+\r)\K\1
in the Find line and nothing in the Replace line.
Pressing 'Replace All' gave one substitution (while there are hundreds of exactly equal lines), pressing again 'Replace All' produced zero other findings/deletion.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Strange that the Grep finds anything at all. There are spaces between the numbers and the tab, which you did not mention before.
Copy link to clipboard
Copied
Hi together,
I would try this one:
Find GREP:
(\d{7}\h{2}~y\d{3}\r)\1
Change GREP:
$1
Do that two or more times until nothing is found.
And let's hope that the pattern for all paragraphs* is:
A range of 7 digits followed by two horizontal white spaces followed by a right alligned tab followed by a range of 3 digits followed by a end of paragraph special character.
[*] I can see from the screenshot that we perhaps also have to deal with anchored objects.
Regards,
Uwe Laubender
( Adobe Community Professional )
Copy link to clipboard
Copied
Hm.
Perhaps its better as a first step to get rid of possible white space characters between the first range of digits and the right align tab and then tackle the issue with duplicate paragraphs. I'm not sure if the number of white spaces after the first range of digits is always the same. ( Just a guess of course. )
Regards,
Uwe Laubender
( Adobe Community Professional )
Copy link to clipboard
Copied
Hallo Uwe
Leider nein. So einfach ist es nicht. Sieh dir bitte noch einmal die ersten Screenshots an.
Copy link to clipboard
Copied
Thank you, @pixxxelschubser !
You are right! Then we have to talk about "sameness" and the flexible patterns with \K. And what that means for atomic groups. Or we have to take a shortcut and write a script. 🙂
Let's see what our OP thinks about all this…
Thanks,
Uwe Laubender
( Adobe Community Professional )
Copy link to clipboard
Copied
This is beyond my skills... The only thing I am sure of is that I failed also the title, because it's not "Simple" indeed... Thank you to the experts that replies or propose usable solutions. At the moment I just can't say that it is "solved".
Copy link to clipboard
Copied
Excuse me if I insist... Would it be possible to tell GREP to ignore ANY object before checking for duplicates?
Would this reduce the complexity of the task? Thank you
Copy link to clipboard
Copied
Hi @Gioyer07 ,
usually the issue with duplicate paragraphs should be solved with the data source.
So that duplicate paragraphs will not show up in InDesign after placing the content.
Regards,
Uwe Laubender
( Adobe Community Professional )
Copy link to clipboard
Copied
Thank you Laubender
That would be a solution if only backend data would know the page number of all the codes. Happens that knows not and never will but they are needed for printing.
Copy link to clipboard
Copied
Please upload an 1 page example IDML (with no confidential data, but enough samples) on a hoster of your choise and link here.
Copy link to clipboard
Copied
I uploaded it here using Wetransfer. The link is usually available only a week and I wonder if the 4 Mb document would be more useful to be uploaded somewhere else, with no expiry date for other potentially interested users. Thank You so much.
Copy link to clipboard
Copied
Solution (for newbies like me) has to follow these steps
1) clean the document from spaces, tabs etc. (use show hidden characters).
2) apply grep
find
^(.+\r)\1+
replace
$1