Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

REreplace[NoCase](): string only? Or can a function be the replacement?

LEGEND ,
Feb 04, 2016 Feb 04, 2016

Hello, all,

According to Adobe documentation, REreplace() (ergo, REreplaceNoCase(), too) requires three arguments: the target string; the regex match; and the value to replace that match with.  The docs also state that the third argument is a string.  Nothing is mentioned of using a function for the third argument.

But in JavaScript (ex: str.replace(x,y)), while the prefix str is the target string, the two arguments that go inside the parenthesis are the regex or substring match, and the replacement - which can be either a string, or a function.

The reason I bring this to attention is because in the past I have needed (and now would like) to write something that will convert HTML entities into the character said entity is supposed to represent.  But I have had no success by using just a string as the replacement.

For example:  Say I have a string, such as "This is a test&#59; this is only a test."  (Yeah, I know, VERY basic.)  Well, I don't always know that what I need to replace will be &#59;.  It could be &#XXXX;, for all I know.  But I do know that (at least most if not all the time) &#59; in CF is chr(59).

So I tried several variations of:

REreplaceNoCase( str, '&##(\d);', '#chr(\1)#', 'all' )  <!--- and more, but I'm kind of tired, so I'm only giving one example --->

None worked.  Then (today) it occurred to me - a function might be able to do it.  But the docs indicate that only a string can be the replacement for REreplace[NoCase]().

Frustrating.

Okay.. now I'm not sure where I'm going with this, coz it really isn't a question.  But I guess that I'm wondering if Adobe has any plans to update REreplace() so that a function can be used as the replacement.  Or, if anyone knows of an alternative.

V/r,

^_^

1.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Feb 04, 2016 Feb 04, 2016

I think a lot of the functions like this Adobe simply puts a thin wrapper over an underlying java function. If java has an equivalent, I'm sure CF won't be long. In the meantime - again if java has an equivalent to what you are asking - you can call java directly (it might take a little reverse engineering).

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 04, 2016 Feb 04, 2016

Hi, Steve Sommers‌,

Thanks for responding.  In most environments, that is possible.

In a USG DoD environment, we are denied from having access to creating or directly accessing Java objects. 

MOST anything that can be done via CF TAG or CFSCRIPT, no problem.  But I can't do anything like:

<cfset thisVariable = createObject("java","foo") />

V/r,

^_^

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Feb 04, 2016 Feb 04, 2016


@Wolfshade, since ColdFusion runs on top of Java (as a Java Servlet App), you have the ability to access native Java built-in functions and objects.  You are referring to loading additional Java (3rd party) objects (using CreateObject).

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 04, 2016 Feb 04, 2016

Hi, Carl Von Stetten‌,

SWEET!  I've never done that, though.  Do you know of any online tutorials for stuff like that?

V/r,

^_^

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 07, 2016 Feb 07, 2016

WolfShade wrote:

Hello, all,

According to Adobe documentation, REreplace() (ergo, REreplaceNoCase(), too) requires three arguments: the target string; the regex match; and the value to replace that match with.  The docs also state that the third argument is a string.  Nothing is mentioned of using a function for the third argument.

But in JavaScript (ex: str.replace(x,y)), while the prefix str is the target string, the two arguments that go inside the parenthesis are the regex or substring match, and the replacement - which can be either a string, or a function.

The reason I bring this to attention is because in the past I have needed (and now would like) to write something that will convert HTML entities into the character said entity is supposed to represent.  But I have had no success by using just a string as the replacement.

For example:  Say I have a string, such as "This is a test&#59; this is only a test."  (Yeah, I know, VERY basic.)  Well, I don't always know that what I need to replace will be &#59;.  It could be &#XXXX;, for all I know.  But I do know that (at least most if not all the time) &#59; in CF is chr(59).

So I tried several variations of:

  1. REreplaceNoCase( str, '&##(\d);', '#chr(\1)#', 'all' )  <!--- and more, but I'm kind of tired, so I'm only giving one example ---> 

None worked.  Then (today) it occurred to me - a function might be able to do it.  But the docs indicate that only a string can be the replacement for REreplace[NoCase]().

Frustrating.

Okay.. now I'm not sure where I'm going with this, coz it really isn't a question.  But I guess that I'm wondering if Adobe has any plans to update REreplace() so that a function can be used as the replacement.  Or, if anyone knows of an alternative.

You could just write a simple Coldfusion function that does what you want. Something like

<cfset testString = "This is a test&##59; this is only a test.">

<cfoutput>#replaceNoCaseCustom(testString)#</cfoutput>

<cffunction name="replaceNoCaseCustom" returntype="string">

    <cfargument name="inputString" type="string">

    <cfset var outputString = arguments.inputString>

    <cfloop from="58" to="62" index="indx">

    <cfset outputString = replaceNoCase(outputString,"&##" & chr(indx) & ";", chr(indx), "all")>

    </cfloop>

    <cfreturn outputString>

</cffunction>

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 08, 2016 Feb 08, 2016

Thanks, BKBK‌!  That is awesome.  However, I'm looking to replace ALL instances, not just chr(58) through chr(62).

I've seen some like &#174; (registered), &#169; (copyright), &#8211; (en dash), &#8220; and &#8221; (left and right MS "smart" quotes), and more.  (Sometimes, the users will enter text into MS Word, then copy/paste that into the field.)

If I were to loop that for everything, I think it would be pretty processor intensive.  The RegEx should be less strain on the server CPU (I think.)

V/r,

^_^

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 09, 2016 Feb 09, 2016

@Wolfshade

What did you make of my last suggestion? Is that what you're looking for? It does solve your problem in a single line of code.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 10, 2016 Feb 10, 2016

Apologies, BKBK‌.  I have not had a chance, yet, to implement it (this is for my side project at home).  I'm blown away by how simple it is.  Could you walk me through parts of it?

What are <name1> and <name2>?  Are those backreferences?

\&##?<name1>\d+?<name2>;

I can see the escaped hashmark for CF.  I'm assuming that you have to backslash escape ampersands in Java.  If this is RegEx, the question mark means "zero or one of preceeding character".  How do name1 and name2 play into this?  \d is the character number, obviously.

V/r,

^_^

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 10, 2016 Feb 10, 2016

Hi WolfShade,

The regex is actually ?<name>, taken together.

See for example Regex Tutorial - Named Capturing Groups - Backreference Names

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 11, 2016 Feb 11, 2016

Thanks for the link, BKBK.  Sadly, DoD blocks all sites with .info TLDs.  I will Google for Named Capturing Groups.

V/r,

^_^

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 11, 2016 Feb 11, 2016

WolfShade wrote:

Sadly, DoD blocks all sites with .info TLDs.  I will Google for Named Capturing Groups.

@ WolfShade.

Nevermind. You have to ignore my last (untested) attempt anyway. Here is a one-liner that works:

<cfset testString = "This is a test &##61; this is only a test. This is a test; &##60;this is only a test&##62;. This is a test &##64; this is only a test. &##40;This is a test; this is only a test.&##41;">

<cfset outputString = evaluate(de(testString.replaceAll("&##(?<name1>\d+);",'##chr(${name1})##')))>

<cfoutput>

<strong>testString:</strong> #testString#<br>

<strong>outputString:</strong> #outputString#

</cfoutput>

Output:

testString: This is a test &#61; this is only a test. This is a test; &#60;this is only a test&#62;. This is a test &#64; this is only a test. &#40;This is a test; this is only a test.&#41;<br>

outputString: This is a test = this is only a test. This is a test; <this is only a test>. This is a test @ this is only a test. (This is a test; this is only a test.)

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 11, 2016 Feb 11, 2016

Hi, c_wigginton and Pete_Freitag,

I think you can use canonicalize() without declaring the ESAPI, first.  I did try that, and for some reason it missed a few of my test strings (got most of them, but two or three slipped past.)  I also think that if you set the second and third arguments to true, it will not throw an exception when it finds nested HTML entities.  I'll give it another shot - maybe I missed something the first time.

BKBK‌, you had me up until I saw evaluate().    I never use eval(uate); not in CF, not in JavaScript.  As a tagline I once saw states:  "eval(x,y); - The Axis of Eval"

V/r,

^_^

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 11, 2016 Feb 11, 2016

WolfShade wrote:

BKBK, you had me up until I saw evaluate().    I never use eval(uate); not in CF, not in JavaScript.  As a tagline I once saw states:  "eval(x,y); - The Axis of Eval"

Fair enough. I remain with the satisfaction that it answers your original question in just one line of code.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 11, 2016 Feb 11, 2016

BKBK wrote:

Fair enough. I remain with the satisfaction that it answers your original question in just one line of code.

True, dat!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 11, 2016 Feb 11, 2016
LATEST

Canonicalized some more. Handy function!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 11, 2016 Feb 11, 2016

WolfShade wrote:

Hi, c_wigginton and Pete_Freitag,

I think you can use canonicalize() without declaring the ESAPI, first.  I did try that, and for some reason it missed a few of my test strings (got most of them, but two or three slipped past.)  I also think that if you set the second and third arguments to true, it will not throw an exception when it finds nested HTML entities.  I'll give it another shot - maybe I missed something the first time.

I also tried canonicalize (my opportunity to call it for the first time!). Using 2 'false' args solves your problem, too:

<cfset testString = "This is a test &##61; this is only a test. This is a test; &##60;this is only a test&##62;. This is a test &##64; this is only a test. &##40;This is a test; this is only a test.&##41;">

<cfset outputString = canonicalize(testString,false,false)>

<cfoutput>

<strong>testString:</strong> #testString#<br>

<strong>outputString:</strong> #outputString#

</cfoutput>

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Feb 08, 2016 Feb 08, 2016

Maybe something like this:

public string function myCustomReplace( required string str ){

  local.pattern = "&##(\d);";

  local.items = reMatch(local.pattern,arguments.str);

  for(local.item in local.items){

   local.replacementValue = val(reReplace(local.item,local.pattern,"\1"));

   arguments.str = replace(arguments.str,local.item,chr(local.replacementValue),"ALL");

  }

  return arguments.str;

}

Not tested; may have typos but you should be able to get the gist.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 08, 2016 Feb 08, 2016

WolfShade wrote:

So I tried several variations of:

  1. REreplaceNoCase( str, '&##(\d);', '#chr(\1)#', 'all' )  <!--- and more, but I'm kind of tired, so I'm only giving one example ---> 

None worked.

A neat solution, borrowed from Java:

  <cfset transformedString = str.replaceAll("\&##?<name1>\d+?<name2>;", "${name1}")>

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Feb 11, 2016 Feb 11, 2016

<cfset esapi = createObject("java", "org.owasp.esapi.ESAPI") />

<cfset foo = "&ndash;,&mdash;,&iexcl;,&iquest;,&quot;,&ldquo;&##9744;" />

<cfoutput>

    #esapi.encoder().canonicalize(foo)#

</cfoutput>

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 11, 2016 Feb 11, 2016

Use the canonicalize function builtin to CF10+ canonicalize Code Examples and CFML Documentation or you can use the method @c_wigginton posted for CF8-9 fully patched (it includes ESAPI jars if fully patched).

The canonicalize function can reverse HTML entities, URL Encoding and javascript character encoding. It can also deal with nested or mixed encoding (by throwing an exception since it usually signals an attack or by attempting to handle it)

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources