GREP to change firstname lastname to lastname, firstname with exceptions

Report · Aug 25, 2023

Hi everyone,

I've reviewed a number of posts about this but haven't found a script that works for my situation.

I have an index of about 1600 authors for a book catalogue. The index generated as [firstname lastname pagenumber/s] but I need it to be [lastname, firstname pagenumber/s]. However, I have a lot of out of the ordinary names - initials, hyphenated, multiple authors for one book, three names, European glyphs, nicknames and aliases - and the page numbers add another level of complexity. Some examples are:

A. M. Homes 49
Andrew O’Connor 9
Andrew X. Pham 45
Anja Reich-Osang 62
Anne Tyler 20, 37, 38, 40, 41
Behrouz Boochani & Omid Tofighian 55
Catherine de Saint Phalle 24
Charlotte Brontë 12
Chimamanda Ngozi Adichie 8, 53
Claire Bidwell Smith 55
Colm Tóibín 45
D.B.C. Pierre 63
F. Scott Fitzgerald 33
Irène Némirovsky 56
Isabel ‘Spark’ Gill 29
Kenneth Cain, Heidi Postlewait & Andrew Thomson 59
Robert Galbraith (J.K. Rowling) 59
ZZ Packer

Ursula K. Le Guin 34, 35, 36

I've tried a some of the GREP scripts I've found here in the discussions with little success.

Find ^(.+)(, ?)(.+)
replace $3$2$1

Find ^(\w+)(\s)(\w+)$
replace $3$2$1

Any assitance would be appreciated. The full index is attached

Cheers,

Di

Report · Aug 25, 2023

I doubt there's much a GREP expression can do here, because there's too much variety and therefore not enough information in the list to construct rules to do this correctly.

For instance, until I look it up, I can't be quite sure whether it should be:

Adichie, Chimamanda Ngozi or

Ngozi Adichie, Chimamanda

Perhaps ChatGPT could help. (But you'll still need to go over the results with a toothcomb.)

What I would probably do is write a few scripts (or GREPs could work), that would move the last word before any digits to the front and add a comma, then another that would move the last 2 words before any digits to the front and add a comma, and you might even need versions for 3 or 4 words.

That would speed things up, but it would still by a manual job.

Another semi-automatic way would be to go over the list manually, and add non-breaking spaces (or some other formatting, i.e. red colour), between the first names, effectively turning them into a single word.

They you could write a GREP to move all that after the last word.

... Or, more simply, add something like a pipe character ( | ) before each last name, then run a single GREP on the entire list. That's probably the option I would personally go for.

Report · Aug 25, 2023

Thaks TaW,

I think the manipulating to make it possible to use a GREP script may be as much work as just doing the sort manually. But I might give ChatGPT a go and see if it can get me most of the way there.

Cheers,

Di

Report · Aug 25, 2023

Ah, that makes sense. I shuddered to think of someone having to keep an eye on all of the page numbers manually after edits. I see you added the Scripting tag to your post, which will get the attention of additional folks with the skills you need, like @TᴀW.

While you are waiting, it may be worth taking a look at https://indiscripts.com/post/2023/01/indexmatic3-what-s-new. I don't know that it is good match for your job but it is well-regarded for handling complex indexing tasks in InDesign so might just be good to know about for your next project.

~Barb

Report · Aug 25, 2023

Hi Di:

I'm just confirming that you are intending to drop that Word document into InDesign, edit that list, and then manually look up and edit the page numbers, as they appear in the InDesign document?

The normal InDesign workflow is to add index markers in the InDesign document so that you can generate the list from InDesign with automatic page numbers and in that workflow, you would edit the presentation of the names at that time. With 1600 entries, I can see why you might not want to do that, but just wanted to be sure that it is intentional. Either way, it's a lot of work, and I wish you the best.

~Barb

Report · Aug 25, 2023

Thanks Barb,

I've generated the list from InDesign with the page numbers and copied them out to Word purely for the purposes of attaching it to my Adobe Community post. I'm hoping to be able to re-sort the authors list in InDesign using GREP rather than having to do it manually, which as you imply, is quite laborious.

Report · Aug 25, 2023

Like others said, there is too many combinations to handle in one GREP.

You could probably start from "getting out of the way" simplest results - 2 and 3 words - and then manually process the rest.

Report · Aug 25, 2023

Hi @D Pearse, I would approach this with a step-by-step process. Something along these lines:

1. I separate the index text from other text so you can do the following manipulations without messing anything else up (make sure you do the grep on "Story", not "Document"). I'm assuming the text is the same as your sample.

2. use grep to add a marker character at start of every line (I used a bullet character)

findWhat: ^
changeTo: •

This is a temporary marker.

3. use grep to swap around "easy" lines, eg. that have exactly two words before the page numbers:

findwhat: •(\b[[:word:]]+\b\s?)(\b[[:word:]]+\b\s?){1}\t
changeTo: $2,\s$1\t

This changed 596 lines in your sample file of 677 lines. Note that we also remove the • marker from the start because that line is finished.

4. now I make a guess: to treat every word after the first as a compound surname

findWhat: •(\b[[:word:]]+\b\s?)((\b[[:word:]]+\b\s?){2,})\t
changeTo: >$2,\s$1\t

This change 38 lines in your sample file. NOTE: because this is a pretty rough guess, I changed the marker for these lines to a ">" to differentiate them.

So, now you have lines mostly "good" assuming a two word name is always firstname lastname. But you also have some lines which start with either • or >.

Here's a small sample, showing the good lines and the "not sure" lines marked with > and the "too hard" lines marked with •.

Costello, Tim 	56
Flannery, Tim 	16, 19, 49
Winton, Tim 	5, 21, 24, 38, 42, 57
>Goreng Goreng, Tjanara 	12
Wolff, Tobias 	7
Birch, Tony 	6, 8, 44
Chevalier, Tracy 	16
Hay, Trevor 	31
Capote, Truman 	20, 22
•Ursula K. Le Guin	34, 35, 36
Duigan, Virginia 	35
Woolf, Virginia 	18
Stegner, Wallace 	21
•Waris Dirie & Catherine Miller	10
Martel, Yann 	34
Gooneratne, Yasmine 	45
>Noah Harari, Yuval 	15
Smith, Zadie 	41, 44
Ghani, Zaheda 	8
Heller, Zoë 	34
Packer, ZZ 	6
>de Botton, Alain 	15, 18
•A.B. Facey	6

And lastly it is a matter of going through the marked lines manually (you can use find: cmd/ctl-alt-F to go from one to the next) and fixing them. In your sample file, I counted 81 marked lines out of 677.

Does that help?

- Mark

Report · Aug 27, 2023

Amazing. Thanks Mark. That worked really well.

Cheers,

Di

Report · Sep 03, 2023

Hi Mark,

I got this to work on my test list. Now when I've gone to apply these scripts to my final list of 661 names, I can't get past the first step. Once' I've applied the bullet character, none of the other scripts bear any results.

I've double checked I'm in the GREP tab

The scope is correct (story or selection)

I"ve searched the web,

Tried all the other suggestions on this thread plus more and am getting goose eggs.

I'm at a loss. Any suggestions are welcome.

Di

Report · Aug 26, 2023

… Well! I've imported the docx given by the op in a story: 681 paras to be treated.

Grep 1: 629 found

F: ^(\S+(\h\u\.)?)\h(((de|du|Le|St)\h)?\S+)(?=\t)

R: $3, $1 + "red" char style

… So, there're 52 left the op can treat manually with these 3 F/R:

Grep 2:

F: ^(\S+)\h(.+)(?=\t) + "none" char style

R: $2, $1 + "red" char style

Grep 3:

F: ^(.+)\h(\S+)(?=\t) + "none" char style

R: $2, $1 + "red" char style

Still 22 left! =D

Grep 4:

F: ^.+ + "none" char style

R: As you want! …

(^/) The Jedi

Report · Aug 27, 2023

Given the list supplied I came up with this GREP that captures the first names

((\u[[:punct:]]\h)+|^.+?(?=\h))

But I've run out of time and can't continue

If it helps it helps - if it doesn't then it doesn't.

But there it is.

Report · Aug 27, 2023

Eugene,

Such a matter just talks about a capacity to know and understand Grep, and fastly write regex that follow one after the other … like a good thriller! 😉

In this case, any op will play this game in less than 2 minutes:

(^/)

Report · Aug 27, 2023

… Of course, it's an "Index"!!!

What about the relevance of the page numbers if the layout moves?

I will ever prefer to play with Index right entries and play the regex of the video above like in this other video:

(^/)

GREP to change firstname lastname to lastname, firstname with exceptions

1 Correct answer