Skip to main content
ronnag74426794
Participant
April 30, 2024
Question

grep to subscript chemical compunds

  • April 30, 2024
  • 7 replies
  • 1081 views

Good afternoon,

 

I am looking for a simple grep script to find chemical compounds without subscripts and change them to subscripts.

 

They could be with and without parenthesis (C2H4).

 

Many thanks in advance.

This topic has been closed for replies.

7 replies

Marc Autret
Legend
May 1, 2024

@ronnag74426794 

 

Our colleague Laurent Tournier worked on this GREP fifteen years ago 😉

I guess this compact form should sum it up:

 

 

((?<=Uu[bhopqst])|(?<=A[cglmrstu]|B[aehikr]|C[adeflmnorsu]|D[bsy]|E[rsu]|F[emr]|G[ade]|H[efgos]|I[nr]|Kr|L[airu]|M[dgnot]|N[abdeiop]|Os|P[abdmortu]|R[abefghnu]|S[bcegimnr]|T[abcehilm]|Xe|Yb|Z[nr])|(?<=[BCFHIKNOPSUVWY]))[1-9]\d{0,1}

 

 

(Edit: if you need more than two digits change the ending \d{0,1} into \d{0,2} or more — I don't know whether it's chemically relevant though.)

 

Best,

Marc

Colin Flashman
Community Expert
Community Expert
May 1, 2024

Most comprehensive one I've seen was written over at CreativePro.com, and this uses GREP styles to do this automatically, but best GREP style is in the comments section by Laurent Tournier: https://creativepro.com/auto-format-superscript-and-subscript-numbers-using-grep-styles/

 

If the answer wasn't in my post, perhaps it might be on my blog at colecandoo!
Rene Andritsch
Community Expert
Community Expert
May 1, 2024

A tip for typographic consideration that you define the subscript as an OpenType subscript (in the Character Style) if the font has it available. This will look much nicer.

Community Expert
May 1, 2024

Just to add to other suggestions (I tried them and didn't quite work forme for some reason)

 

(?<=(\l|\u))\d+

 

 

 

But Caveat is that it will find other things that you might not want found.

 

Like C1 and C2 in the example 

 

 

 

Scott Falkner
Community Expert
Community Expert
April 30, 2024

Use a GREP Style and this expression:

 

 

(?<=\u\l)\d+|(?<=\u)\d+

 

 

 

 

For the nerds: I couldn’t get the OR to work within one PLB.

Robert at ID-Tasker
Legend
April 30, 2024

@Scott Falkner

 

How about:

 

(?<=\u\l?)\d+

 

But I'm on my phone so can't check - "?" means "zero or one" so should work?  

 

You are right - looks like "?" can't be used inside Positive LookBehind either.

 

brian_p_dts
Community Expert
Community Expert
April 30, 2024