Copy link to clipboard
Copied
Hello, all,
According to Adobe documentation, REreplace() (ergo, REreplaceNoCase(), too) requires three arguments: the target string; the regex match; and the value to replace that match with. The docs also state that the third argument is a string. Nothing is mentioned of using a function for the third argument.
But in JavaScript (ex: str.replace(x,y)), while the prefix str is the target string, the two arguments that go inside the parenthesis are the regex or substring match, and the replacement - which can be either a string, or a function.
The reason I bring this to attention is because in the past I have needed (and now would like) to write something that will convert HTML entities into the character said entity is supposed to represent. But I have had no success by using just a string as the replacement.
For example: Say I have a string, such as "This is a test; this is only a test." (Yeah, I know, VERY basic.) Well, I don't always know that what I need to replace will be ;. It could be &#XXXX;, for all I know. But I do know that (at least most if not all the time) ; in CF is chr(59).
So I tried several variations of:
REreplaceNoCase( str, '&##(\d);', '#chr(\1)#', 'all' ) <!--- and more, but I'm kind of tired, so I'm only giving one example --->
None worked. Then (today) it occurred to me - a function might be able to do it. But the docs indicate that only a string can be the replacement for REreplace[NoCase]().
Frustrating.
Okay.. now I'm not sure where I'm going with this, coz it really isn't a question. But I guess that I'm wondering if Adobe has any plans to update REreplace() so that a function can be used as the replacement. Or, if anyone knows of an alternative.
V/r,
^_^
Copy link to clipboard
Copied
I think a lot of the functions like this Adobe simply puts a thin wrapper over an underlying java function. If java has an equivalent, I'm sure CF won't be long. In the meantime - again if java has an equivalent to what you are asking - you can call java directly (it might take a little reverse engineering).
Copy link to clipboard
Copied
Hi, Steve Sommers‌,
Thanks for responding. In most environments, that is possible.
In a USG DoD environment, we are denied from having access to creating or directly accessing Java objects.
MOST anything that can be done via CF TAG or CFSCRIPT, no problem. But I can't do anything like:
<cfset thisVariable = createObject("java","foo") />
V/r,
^_^
Copy link to clipboard
Copied
@Wolfshade, since ColdFusion runs on top of Java (as a Java Servlet App), you have the ability to access native Java built-in functions and objects. You are referring to loading additional Java (3rd party) objects (using CreateObject).
Copy link to clipboard
Copied
Hi, Carl Von Stetten‌,
SWEET! I've never done that, though. Do you know of any online tutorials for stuff like that?
V/r,
^_^
Copy link to clipboard
Copied
WolfShade wrote:
Hello, all,
According to Adobe documentation, REreplace() (ergo, REreplaceNoCase(), too) requires three arguments: the target string; the regex match; and the value to replace that match with. The docs also state that the third argument is a string. Nothing is mentioned of using a function for the third argument.
But in JavaScript (ex: str.replace(x,y)), while the prefix str is the target string, the two arguments that go inside the parenthesis are the regex or substring match, and the replacement - which can be either a string, or a function.
The reason I bring this to attention is because in the past I have needed (and now would like) to write something that will convert HTML entities into the character said entity is supposed to represent. But I have had no success by using just a string as the replacement.
For example: Say I have a string, such as "This is a test; this is only a test." (Yeah, I know, VERY basic.) Well, I don't always know that what I need to replace will be ;. It could be &#XXXX;, for all I know. But I do know that (at least most if not all the time) ; in CF is chr(59).
So I tried several variations of:
REreplaceNoCase( str, '&##(\d);', '#chr(\1)#', 'all' ) <!--- and more, but I'm kind of tired, so I'm only giving one example --->
None worked. Then (today) it occurred to me - a function might be able to do it. But the docs indicate that only a string can be the replacement for REreplace[NoCase]().
Frustrating.
Okay.. now I'm not sure where I'm going with this, coz it really isn't a question. But I guess that I'm wondering if Adobe has any plans to update REreplace() so that a function can be used as the replacement. Or, if anyone knows of an alternative.
You could just write a simple Coldfusion function that does what you want. Something like
<cfset testString = "This is a test&##59; this is only a test.">
<cfoutput>#replaceNoCaseCustom(testString)#</cfoutput>
<cffunction name="replaceNoCaseCustom" returntype="string">
<cfargument name="inputString" type="string">
<cfset var outputString = arguments.inputString>
<cfloop from="58" to="62" index="indx">
<cfset outputString = replaceNoCase(outputString,"&##" & chr(indx) & ";", chr(indx), "all")>
</cfloop>
<cfreturn outputString>
</cffunction>
Copy link to clipboard
Copied
Thanks, BKBK‌! That is awesome. However, I'm looking to replace ALL instances, not just chr(58) through chr(62).
I've seen some like ® (registered), © (copyright), – (en dash), “ and ” (left and right MS "smart" quotes), and more. (Sometimes, the users will enter text into MS Word, then copy/paste that into the field.)
If I were to loop that for everything, I think it would be pretty processor intensive. The RegEx should be less strain on the server CPU (I think.)
V/r,
^_^
Copy link to clipboard
Copied
@Wolfshade
What did you make of my last suggestion? Is that what you're looking for? It does solve your problem in a single line of code.
Copy link to clipboard
Copied
Apologies, BKBK‌. I have not had a chance, yet, to implement it (this is for my side project at home). I'm blown away by how simple it is. Could you walk me through parts of it?
What are <name1> and <name2>? Are those backreferences?
\&##?<name1>\d+?<name2>;
I can see the escaped hashmark for CF. I'm assuming that you have to backslash escape ampersands in Java. If this is RegEx, the question mark means "zero or one of preceeding character". How do name1 and name2 play into this? \d is the character number, obviously.
V/r,
^_^
Copy link to clipboard
Copied
Hi WolfShade,
The regex is actually ?<name>, taken together.
See for example Regex Tutorial - Named Capturing Groups - Backreference Names
Copy link to clipboard
Copied
Thanks for the link, BKBK. Sadly, DoD blocks all sites with .info TLDs. I will Google for Named Capturing Groups.
V/r,
^_^
Copy link to clipboard
Copied
WolfShade wrote:
Sadly, DoD blocks all sites with .info TLDs. I will Google for Named Capturing Groups.
@ WolfShade.
Nevermind. You have to ignore my last (untested) attempt anyway. Here is a one-liner that works:
<cfset testString = "This is a test &##61; this is only a test. This is a test; &##60;this is only a test&##62;. This is a test &##64; this is only a test. &##40;This is a test; this is only a test.&##41;">
<cfset outputString = evaluate(de(testString.replaceAll("&##(?<name1>\d+);",'##chr(${name1})##')))>
<cfoutput>
<strong>testString:</strong> #testString#<br>
<strong>outputString:</strong> #outputString#
</cfoutput>
Output:
testString: This is a test = this is only a test. This is a test; <this is only a test>. This is a test @ this is only a test. (This is a test; this is only a test.)<br>
outputString: This is a test = this is only a test. This is a test; <this is only a test>. This is a test @ this is only a test. (This is a test; this is only a test.)
Copy link to clipboard
Copied
Hi, c_wigginton and Pete_Freitag,
I think you can use canonicalize() without declaring the ESAPI, first. I did try that, and for some reason it missed a few of my test strings (got most of them, but two or three slipped past.) I also think that if you set the second and third arguments to true, it will not throw an exception when it finds nested HTML entities. I'll give it another shot - maybe I missed something the first time.
BKBK‌, you had me up until I saw evaluate(). I never use eval(uate); not in CF, not in JavaScript. As a tagline I once saw states: "eval(x,y); - The Axis of Eval"
V/r,
^_^
Copy link to clipboard
Copied
WolfShade wrote:
BKBK, you had me up until I saw evaluate(). I never use eval(uate); not in CF, not in JavaScript. As a tagline I once saw states: "eval(x,y); - The Axis of Eval"
Fair enough. I remain with the satisfaction that it answers your original question in just one line of code.
Copy link to clipboard
Copied
BKBK wrote:
Fair enough. I remain with the satisfaction that it answers your original question in just one line of code.
True, dat!
Copy link to clipboard
Copied
Canonicalized some more. Handy function!
Copy link to clipboard
Copied
WolfShade wrote:
Hi, c_wigginton and Pete_Freitag,
I think you can use canonicalize() without declaring the ESAPI, first. I did try that, and for some reason it missed a few of my test strings (got most of them, but two or three slipped past.) I also think that if you set the second and third arguments to true, it will not throw an exception when it finds nested HTML entities. I'll give it another shot - maybe I missed something the first time.
I also tried canonicalize (my opportunity to call it for the first time!). Using 2 'false' args solves your problem, too:
<cfset testString = "This is a test &##61; this is only a test. This is a test; &##60;this is only a test&##62;. This is a test &##64; this is only a test. &##40;This is a test; this is only a test.&##41;">
<cfset outputString = canonicalize(testString,false,false)>
<cfoutput>
<strong>testString:</strong> #testString#<br>
<strong>outputString:</strong> #outputString#
</cfoutput>
Copy link to clipboard
Copied
Maybe something like this:
public string function myCustomReplace( required string str ){
local.pattern = "&##(\d);";
local.items = reMatch(local.pattern,arguments.str);
for(local.item in local.items){
local.replacementValue = val(reReplace(local.item,local.pattern,"\1"));
arguments.str = replace(arguments.str,local.item,chr(local.replacementValue),"ALL");
}
return arguments.str;
}
Not tested; may have typos but you should be able to get the gist.
Copy link to clipboard
Copied
WolfShade wrote:
So I tried several variations of:
- REreplaceNoCase( str, '&##(\d);', '#chr(\1)#', 'all' ) <!--- and more, but I'm kind of tired, so I'm only giving one example --->
None worked.
A neat solution, borrowed from Java:
<cfset transformedString = str.replaceAll("\&##?<name1>\d+?<name2>;", "${name1}")>
Copy link to clipboard
Copied
<cfset esapi = createObject("java", "org.owasp.esapi.ESAPI") />
<cfset foo = "–,—,¡,¿,",“&##9744;" />
<cfoutput>
#esapi.encoder().canonicalize(foo)#
</cfoutput>
Copy link to clipboard
Copied
Use the canonicalize function builtin to CF10+ canonicalize Code Examples and CFML Documentation or you can use the method @c_wigginton posted for CF8-9 fully patched (it includes ESAPI jars if fully patched).
The canonicalize function can reverse HTML entities, URL Encoding and javascript character encoding. It can also deal with nested or mixed encoding (by throwing an exception since it usually signals an attack or by attempting to handle it)