Copy link to clipboard
Copied
Hi Knowledge People
What is the simplest syntax to find identical "values" (may be words or numbers) in a document?
Thanks
You can do it with three queries. First mark all lines with identical first values, don't mark the first occurrence:
Find what: ^(.+?~y).+\r\K(\1.+\r)+
Change to: <Leave empty>
Find format: <Leave empty>
Change format: +Strikethrough
Then delete everything that has strikethru applied up to the tab:
Find what: ^.+?(?=~y)
Change to: <Leave empty>
Find format: +Strikethrough
Change format: <Leave empty>
Finally, remove all strikethrough:
Find what: <Leave empty>
Change to: <Leave empty>
Find format
...
Thanks for the file. When I tried the GREP queries on your (big) file I noticed that those pesky inline graphics get in the way. They can't be accommodated in the queries I gave, so you need two additional queries.
This one goes first, before anything else. It places a tab before an inline:
Find what: (?=~a)
Change to: ~y
Find format: <Leave empty>
Change format: <Leave empty>
Then run this one at the very end to delete the tabs before the inlines:
Find what: ~y(?=~a)
Change to: <Leave empty>
...
Copy link to clipboard
Copied
Copy link to clipboard
Copied
I'd need to search identical values in a long list of numbers, over 15.000 spread across several pages but in the same text frame. I would then replace/delete them with empty-nothing in order to achieve the desired number of pages of the book (that needs to be a multiple of 16 pages to match offset printing).
Well... to be more precise I'd also need to discriminate between numbers that are before the separator (or have more than three digits)
Copy link to clipboard
Copied
Then GREP alone won't be able to do this.
You would need to first get a list of all values.
Then - it depends on the tool you'll use...
And you mean the same Story - not the same Text Frame.
Copy link to clipboard
Copied
Never mind about any other value after the codes: they are related to the page numbers and can be discriminated differentiating the character styles. I just need a syntax that would find matching identical numbers. Thank you.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
They are only partially sorted. Usually is the last operation I make... there's a script I've installed for this purpose.
Copy link to clipboard
Copied
If you'll sort them anyway - you could copy everything to Excel and remove duplicates there.
The only "problem" - this "NEW" tag - it's anchored / inline, right?
Copy link to clipboard
Copied
Yes. The NEW image would be lost...
Copy link to clipboard
Copied
Yes. The NEW image would be lost...
By @Gioyer07
I'm pretty sure you can search for anchored / inline marker, replace it with some unique text - go to Excel and do what you need - then replace this unique "marker" with contents of the Clipboard - this NEW tag.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Yes. The NEW image would be lost...
By @Gioyer07
Or you should rather do this:
1) replace "^t" -> "^t^t"
2) replace "^a^t" -> "^t#NEW#"
so you'll have your "tag" in a separate column so it won't affect sorting and you'll have all your number as values - not text.
Copy link to clipboard
Copied
Yes. The NEW image would be lost...
By @Gioyer07
Or you should rather do this:
1) replace "^t" -> "^t^t"
2) replace "^a^t" -> "^t#NEW#"
so you'll have your "tag" in a separate column so it won't affect sorting and you'll have all your number as values - not text.
By @Robert at ID-Tasker
Then:
Of course you would've to first copy into Clipboard your graphic representation of this NEW tag.
Copy link to clipboard
Copied
There are GREP expressions to find identical values in a text, but in your case, numbers in a table, they won't work. But finding duplicate those values (and acting upon any found) is pretty easy to script. Excel isn't needed at all.
Can you indicate what should be done in the list in your screenshot? There are various duplicate numbers before the tab separator, followed by 3-digit numbers after the tab. What should be done about it?
Copy link to clipboard
Copied
There are GREP expressions to find identical values in a text, but in your case, numbers in a table, they won't work. But finding duplicate those values (and acting upon any found) is pretty easy to script. Excel isn't needed at all.
Can you indicate what should be done in the list in your screenshot? There are various duplicate numbers before the tab separator, followed by 3-digit numbers after the tab. What should be done about it?
By @Peter Kahrel
It's not a table per se - it's TABdelimited text in text columns.
I think script would be overkill - doing it in Excel should be much easier - I mean OP would have full control over the text - won't have to wait for scripting person to make any extra changes... at least in this case - simple removal of duplicates.
Unless, OP wants to keep page numbers to build an Index - but still, it can be done in Excel...
...
Unless ... OP want's to convert those page numbers - into Hyperlinks / CrossReferences...
Copy link to clipboard
Copied
[...]Unless ... OP want's to convert those page numbers - into Hyperlinks / CrossReferences...
But in that case - InDesign can build Index automatically - or completely different script could be used:
https://creativepro.com/files/kahrel/indesign/lists_indexes.html
Copy link to clipboard
Copied
The expected result would be to replace with empty all the digits after the first one (never mind about any other value after the codes: they are related to the page numbers and can be discriminated differentiating the character styles).
Copy link to clipboard
Copied
By @Gioyer07The expected result would be to replace with empty all the digits after the first one (never mind about any other value after the codes: they are related to the page numbers and can be discriminated differentiating the character styles).
Then as @Peter Kahrel said - GREP should be able to find duplicates - ignoring texts in between.
Copy link to clipboard
Copied
> It's not a table per se - it's TABdelimited text in text columns.
Well, his screenshot shows a table. But in the event that it's a plain (tab-delimited) column, a GREP expression can find those duplicate values.
Let's see what the OP replies.
Copy link to clipboard
Copied
Correct. It is not a table. Just plain text. Now codes and pages have the same style but I will rebuild the index and assign a different style to the pages, so GREP search will be assigned on the codes only.
Copy link to clipboard
Copied
> It's not a table per se - it's TABdelimited text in text columns.
Well, his screenshot shows a table. But in the event that it's a plain (tab-delimited) column, a GREP expression can find those duplicate values.
Let's see what the OP replies.
By @Peter Kahrel
Those lines - is just Underline.
Table would have "#" at the end of the "line":
Copy link to clipboard
Copied
You can do it with three queries. First mark all lines with identical first values, don't mark the first occurrence:
Find what: ^(.+?~y).+\r\K(\1.+\r)+
Change to: <Leave empty>
Find format: <Leave empty>
Change format: +Strikethrough
Then delete everything that has strikethru applied up to the tab:
Find what: ^.+?(?=~y)
Change to: <Leave empty>
Find format: +Strikethrough
Change format: <Leave empty>
Finally, remove all strikethrough:
Find what: <Leave empty>
Change to: <Leave empty>
Find format: +Strikethrough
Change format: -Strikethrough
Copy link to clipboard
Copied
Thank you Peter
I uploaded the IDD document here.
I should have done it straight at the beginning.
Great help and kindness.
Copy link to clipboard
Copied
Thank you Peter
I uploaded the IDD document here.
I should have done it straight at the beginning.
Great help and kindness.
By @Gioyer07
You need to do "purge" / "garbage collection" - do SAVE AS with a new name from time to time.
OK, looks like you've done Save As earlier today:
Recovered MiniSave on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 04 April 2023 at 11:15
Save As on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 09:05
Book - repaginate on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 09:08
Book - repaginate on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 09:17
Book - repaginate on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 09:41
Book - repaginate on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 15:06
Book - repaginate on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 15:06
Book - repaginate on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 15:07
Book - repaginate on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 15:07
Book - repaginate on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 15:09
Most recent Save on Windows x64 10.0 in app version 17.4.1.67 (FS InDesign Roman) build 67 on 20 March 2024 at 15:09
Open As Copy on Windows x64 10.0 in app version 19.3.0.58 (FS InDesign Roman) build 58 on 20 March 2024 at 15:41
Converted on Windows x64 10.0 in app version 19.3.0.58 (FS InDesign Roman) build 58 on 20 March 2024 at 15:41
Save As on Windows x64 10.0 in app version 19.3.0.58 (FS InDesign Roman) build 58 on 20 March 2024 at 15:42
But those repaginations added 70MB??
Copy link to clipboard
Copied
Thanks for the file. When I tried the GREP queries on your (big) file I noticed that those pesky inline graphics get in the way. They can't be accommodated in the queries I gave, so you need two additional queries.
This one goes first, before anything else. It places a tab before an inline:
Find what: (?=~a)
Change to: ~y
Find format: <Leave empty>
Change format: <Leave empty>
Then run this one at the very end to delete the tabs before the inlines:
Find what: ~y(?=~a)
Change to: <Leave empty>
Find format: <Leave empty>
Change format: <Leave empty>
So that's five queries altogether. If you use these more than once you can use this script to run them altogether:
https://creativepro.com/files/kahrel/indesign/grep_query_runner.html