• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
1

Subscript basic chemical formulas

New Here ,
Jan 04, 2016 Jan 04, 2016

Copy link to clipboard

Copied

Hi

I work on a biweekly agricultural magazine in which basic chemical formulas (CO2, CH4, N2O, ...) regularly appear, so naturally I want to automate the subscript part of these formulas with a GREP style in my paragraph style.

I have found a very robust GREP expression (I only use the 'Down' part) to find the numbers and change them to subscript (Credit to Vasco Elbrecht).

(?<=[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)])\d{1,3}(?=[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)|( )|(\()])

Now the problem is that in my case the formulas are used in sentences, so they can be followed immediately by a period, a comma, closing bracket and some other punctuation marks. This has not been implemented in Elbrecht's script so I tried to modify it (with my minimal GREP knowledge) to my needs, which resulted in this:

(?<=[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)])\d{1,3}(?=[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)|( )|\)])

But now when I just type some numbers (less than 4) between brackets, they will be put in subscript too, which of course is not necessary...

Is there any way this can be solved? Or should I just keep changing the numbers manually?

Cheers,

Laurens

Bonus:  tried to write a little grep script for mm, cm, m and km squared or cubed superscript (only if preceded by a number and a space), but I could not figure out how to put them all in one script, so now I have these two GREP styles:

(?<=\d m)(2|3)

(?<=\d (c|m|k)m)(2|3)

Is there any way to combine these two in one script?

TOPICS
Scripting

Views

2.5K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Jan 04, 2016 Jan 04, 2016

Are you sure you copied the expression correctly? You get everything between parentheses subscripted because both entire left ("lookbehind") and right ("lookahead") expression contain lots of unique chemical elements, separated by | ('OR') but also within square brackets ([..]). This notation is only and exclusively used for a Single Character Set. So the entire expression

[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)]

actually checks for one single occurrence of one the characters '(' 'N' 'a' ')' '|' 'C' 'l' 'H

...

Votes

Translate

Translate
Community Expert ,
Jan 04, 2016 Jan 04, 2016

Copy link to clipboard

Copied

Are you sure you copied the expression correctly? You get everything between parentheses subscripted because both entire left ("lookbehind") and right ("lookahead") expression contain lots of unique chemical elements, separated by | ('OR') but also within square brackets ([..]). This notation is only and exclusively used for a Single Character Set. So the entire expression

[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)]

actually checks for one single occurrence of one the characters '(' 'N' 'a' ')' '|' 'C' 'l' 'H' .. and so on. Note this includes the single parentheses, before and after. You can also see this when you type "a2a" – the '2' will get subscripted.

You should remove the square brackets, but then you end up with a lookbehind with elements of variable length (one part is "Na", another is "H") which is not supported by InDesign. To fix that, you need to split up the lookbehind into two parts: one that looks for 2 characters OR one that looks for 1 character.

Due to the nature of chemical notation, I don't think you need the lookahead at all! The following regex will match the test compounds you mention:

((?<=Na|Cl)|(?<=H|C|O|S|N))\d{1,3}(?!\d)

and it will refuse to fire when one of these letters are followed by more than 3 digits – that's what the negative lookahead is for.

Bonus challenge

It's tempting to devise a regex "(?<=\d [cmk]?m)[23]\b" (which checks for 'm' with an optionally 'c', 'm', or 'k' prefix) but, again, it will not work because of the variable length. In this case it's the question mark ('?') that causes the length to be variable: after all, it means "either zero or once". But you can duplicate the lookbehinds again, and end up with a regex which is longer and more cumbersome, but at least works:

((?<=\d [cmk]m)|(?<=\d m))[23]\b

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 04, 2016 Jan 04, 2016

Copy link to clipboard

Copied

Don't forget the Ununtrium! 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 05, 2016 Jan 05, 2016

Copy link to clipboard

Copied

Sorry for the late reply, but that is exactly right!

Many thanks for the explanations as well, since it clears up why the code I was using didn't work. Especially about the variable lengths, I did not know that... And the first code you wrote with the bonus was the one I tried too . Because it seemed so logical.

With this newfound knowledge I'll be able to tackle other problems that might come up.

Thanks again Jongware!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 14, 2018 Sep 14, 2018

Copy link to clipboard

Copied

Is there a guide on how to apply this somewhere?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 14, 2018 Sep 14, 2018

Copy link to clipboard

Copied

That would be the Online Help then: Drop caps and nested styles in InDesign: Create GREP styles​.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 16, 2018 Sep 16, 2018

Copy link to clipboard

Copied

LATEST

Thank you, seems I had to reapply the paragraph style for it to work.


Here's the whole periodic table included:

((?<=He|Li|Be|Ne|Na|Mg|Al|Li|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Ac|Th|Pa|U|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|Rf|Db|Sg|Bh|Hs|Mt|Ds|Rg|Cn|Uut|Fl|Mc|Lv|Uus|Og|)|(?<=I|H|B|C|N|O|F|P|S|V|K|Y|W))\d{1,3}(?!\d)

And this is a help:
https://regexr.com/

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines