Copy link to clipboard
Copied
Hi everyone,
I've reviewed a number of posts about this but haven't found a script that works for my situation.
I have an index of about 1600 authors for a book catalogue. The index generated as [firstname lastname pagenumber/s] but I need it to be [lastname, firstname pagenumber/s]. However, I have a lot of out of the ordinary names - initials, hyphenated, multiple authors for one book, three names, European glyphs, nicknames and aliases - and the page numbers add another level of complexity. Some examples are:
A. M. Homes 49
Andrew O’Connor 9
Andrew X. Pham 45
Anja Reich-Osang 62
Anne Tyler 20, 37, 38, 40, 41
Behrouz Boochani & Omid Tofighian 55
Catherine de Saint Phalle 24
Charlotte Brontë 12
Chimamanda Ngozi Adichie 8, 53
Claire Bidwell Smith 55
Colm TóibÃn 45
D.B.C. Pierre 63
F. Scott Fitzgerald 33
Irène Némirovsky 56
Isabel ‘Spark’ Gill 29
Kenneth Cain, Heidi Postlewait & Andrew Thomson 59
Robert Galbraith (J.K. Rowling) 59
ZZ Packer
Ursula K. Le Guin 34, 35, 36
I've tried a some of the GREP scripts I've found here in the discussions with little success.
Find ^(.+)(, ?)(.+)
replace $3$2$1
Find ^(\w+)(\s)(\w+)$
replace $3$2$1
Any assitance would be appreciated. The full index is attached
Cheers,
Di
Hi @D Pearse, I would approach this with a step-by-step process. Something along these lines:
1. I separate the index text from other text so you can do the following manipulations without messing anything else up (make sure you do the grep on "Story", not "Document"). I'm assuming the text is the same as your sample.
2. use grep to add a marker character at start of every line (I used a bullet character)
findWhat: ^
changeTo: •
This is a temporary marker.
3. use grep to swap around "easy"
...Copy link to clipboard
Copied
I doubt there's much a GREP expression can do here, because there's too much variety and therefore not enough information in the list to construct rules to do this correctly.
For instance, until I look it up, I can't be quite sure whether it should be:
Adichie, Chimamanda Ngozi or
Ngozi Adichie, Chimamanda
Perhaps ChatGPT could help. (But you'll still need to go over the results with a toothcomb.)
What I would probably do is write a few scripts (or GREPs could work), that would move the last word before any digits to the front and add a comma, then another that would move the last 2 words before any digits to the front and add a comma, and you might even need versions for 3 or 4 words.
That would speed things up, but it would still by a manual job.
Another semi-automatic way would be to go over the list manually, and add non-breaking spaces (or some other formatting, i.e. red colour), between the first names, effectively turning them into a single word.
They you could write a GREP to move all that after the last word.
... Or, more simply, add something like a pipe character ( | ) before each last name, then run a single GREP on the entire list. That's probably the option I would personally go for.
Copy link to clipboard
Copied
Thaks TaW,
I think the manipulating to make it possible to use a GREP script may be as much work as just doing the sort manually. But I might give ChatGPT a go and see if it can get me most of the way there.
Cheers,
Di
Copy link to clipboard
Copied
Ah, that makes sense. I shuddered to think of someone having to keep an eye on all of the page numbers manually after edits. I see you added the Scripting tag to your post, which will get the attention of additional folks with the skills you need, like @Tá´€W.
While you are waiting, it may be worth taking a look at https://indiscripts.com/post/2023/01/indexmatic3-what-s-new. I don't know that it is good match for your job but it is well-regarded for handling complex indexing tasks in InDesign so might just be good to know about for your next project.
~Barb
Copy link to clipboard
Copied
Hi Di:
I'm just confirming that you are intending to drop that Word document into InDesign, edit that list, and then manually look up and edit the page numbers, as they appear in the InDesign document?
The normal InDesign workflow is to add index markers in the InDesign document so that you can generate the list from InDesign with automatic page numbers and in that workflow, you would edit the presentation of the names at that time. With 1600 entries, I can see why you might not want to do that, but just wanted to be sure that it is intentional. Either way, it's a lot of work, and I wish you the best.
~Barb
Copy link to clipboard
Copied
Thanks Barb,
I've generated the list from InDesign with the page numbers and copied them out to Word purely for the purposes of attaching it to my Adobe Community post. I'm hoping to be able to re-sort the authors list in InDesign using GREP rather than having to do it manually, which as you imply, is quite laborious.
Copy link to clipboard
Copied
Like others said, there is too many combinations to handle in one GREP.
You could probably start from "getting out of the way" simplest results - 2 and 3 words - and then manually process the rest.
Copy link to clipboard
Copied
Hi @D Pearse, I would approach this with a step-by-step process. Something along these lines:
1. I separate the index text from other text so you can do the following manipulations without messing anything else up (make sure you do the grep on "Story", not "Document"). I'm assuming the text is the same as your sample.
2. use grep to add a marker character at start of every line (I used a bullet character)
findWhat: ^
changeTo: •
This is a temporary marker.
3. use grep to swap around "easy" lines, eg. that have exactly two words before the page numbers:
findwhat: •(\b[[:word:]]+\b\s?)(\b[[:word:]]+\b\s?){1}\t
changeTo: $2,\s$1\t
This changed 596 lines in your sample file of 677 lines. Note that we also remove the • marker from the start because that line is finished.
4. now I make a guess: to treat every word after the first as a compound surname
findWhat: •(\b[[:word:]]+\b\s?)((\b[[:word:]]+\b\s?){2,})\t
changeTo: >$2,\s$1\t
This change 38 lines in your sample file. NOTE: because this is a pretty rough guess, I changed the marker for these lines to a ">" to differentiate them.
So, now you have lines mostly "good" assuming a two word name is always firstname lastname. But you also have some lines which start with either • or >.
Here's a small sample, showing the good lines and the "not sure" lines marked with > and the "too hard" lines marked with •.
Costello, Tim 56
Flannery, Tim 16, 19, 49
Winton, Tim 5, 21, 24, 38, 42, 57
>Goreng Goreng, Tjanara 12
Wolff, Tobias 7
Birch, Tony 6, 8, 44
Chevalier, Tracy 16
Hay, Trevor 31
Capote, Truman 20, 22
•Ursula K. Le Guin 34, 35, 36
Duigan, Virginia 35
Woolf, Virginia 18
Stegner, Wallace 21
•Waris Dirie & Catherine Miller 10
Martel, Yann 34
Gooneratne, Yasmine 45
>Noah Harari, Yuval 15
Smith, Zadie 41, 44
Ghani, Zaheda 8
Heller, Zoë 34
Packer, ZZ 6
>de Botton, Alain 15, 18
•A.B. Facey 6
And lastly it is a matter of going through the marked lines manually (you can use find: cmd/ctl-alt-F to go from one to the next) and fixing them. In your sample file, I counted 81 marked lines out of 677.
Does that help?
- Mark
Copy link to clipboard
Copied
Amazing. Thanks Mark. That worked really well.
Cheers,
Di
Copy link to clipboard
Copied
Hi Mark,
I got this to work on my test list. Now when I've gone to apply these scripts to my final list of 661 names, I can't get past the first step. Once' I've applied the bullet character, none of the other scripts bear any results.
I've double checked I'm in the GREP tab
The scope is correct (story or selection)
I"ve searched the web,
Tried all the other suggestions on this thread plus more and am getting goose eggs.
I'm at a loss. Any suggestions are welcome.
Di
Copy link to clipboard
Copied
… Well! I've imported the docx given by the op in a story: 681 paras to be treated.
Grep 1: 629 found
F: ^(\S+(\h\u\.)?)\h(((de|du|Le|St)\h)?\S+)(?=\t)
R: $3, $1 + "red" char style
… So, there're 52 left the op can treat manually with these 3 F/R:
Grep 2:
F: ^(\S+)\h(.+)(?=\t) + "none" char style
R: $2, $1 + "red" char style
Grep 3:
F: ^(.+)\h(\S+)(?=\t) + "none" char style
R: $2, $1 + "red" char style
Still 22 left! =D
Grep 4:
F: ^.+ + "none" char style
R: As you want! …
(^/) The Jedi
Copy link to clipboard
Copied
Given the list supplied I came up with this GREP that captures the first names
Copy link to clipboard
Copied
Eugene,
Such a matter just talks about a capacity to know and understand Grep, and fastly write regex that follow one after the other … like a good thriller! 😉
In this case, any op will play this game in less than 2 minutes:
(^/)
Copy link to clipboard
Copied
… Of course, it's an "Index"!!!
What about the relevance of the page numbers if the layout moves?
I will ever prefer to play with Index right entries and play the regex of the video above like in this other video:
(^/)