Copy link to clipboard
Copied
I'm trying to trick SDL into identifying words that are not approved by STE.
Under "Configure|Style and Linguistic Checks|User Defined Rules" the program allows regular expressions to create custom rules.
I have all other options in the Utility unchecked.
I am by no means a pro at regular expressions but was able to create a pretty solid command at http://regexlib.com/RETester.aspx.
The idea is to create an expression that looks for any word other than those seperated by vertical bars.
For the test text "this is not the way that should work. this is not the way that should work."
\b(?:(?!should|not|way|this|is|that).)+
returns: the work the work
At that website, I can change the excluded words and it works every time. Change the test text, same thing, still works.
Perfect! I ripped every approved word in STE into the formula and it (SDL) only returns words at the end of the sentence that are followed by a periods and question marks. So I added"\." to the exclusion list in the expression and it only found words next to question marks. I excluded question marks and now it finds nothing. I don't understand this as I wasn't aware that I had any criteria in the expression that dictates functionality only at the end of the sentence.
I have an O'reilly book to refer to, if anyone can give me a shove in the right direction as to which set of rules to adhere to, I would appreciate it. Why did negative word matching have to be my introduction to this subject?
Copy link to clipboard
Copied
I tried your expression in a couple of regex tools and it seems to parse as you wanted it to. I suspect that the SDL implementation doesn't follow the unix/linux standards. I haven't used the tool and the usage documentation is non-existant, except for the limited flash-based demo.
From the SDL knowledgebase, it states that their regex filter uses the .NET regex flavour and I believe that the differences on this are explained in the "Mastering Regular Expressions" book.