Skip to main content
Inspiring
July 18, 2018
Question

Problem with Canonicalize: &ne becomes ≠ unexpectedly

  • July 18, 2018
  • 2 replies
  • 415 views

I'm using Canonicalize to check url's and have run into a problem.  For some reason, &ne seems to always be translated as ≠ even in situations where that is not expected nor appropriate. 

For example:

<cfset varURL = "www.mySite.com/myPage.cfm?someVar=abc&newVar=1" />

<cfset varCheck = Canonicalize(varURL,true,true)/>

The value of varCheck will be "www.mySite.com/myPage.cfm?someVar=abc≠wVar=1", not "www.mySite.com/myPage.cfm?someVar=abc&newVar=1" as I'm expecting.

It seems to make this translation anytime that a URL variable (other than the first one following the ?) starts with "ne".

Other than renaming the URL variables that start with "ne", is there a fix for this problem?

I'm running CF2016 Update 6.

This topic has been closed for replies.

2 replies

BKBK
Community Expert
Community Expert
July 18, 2018

Why are you using canonicalize on the URL anyway? There is apparently no reason for it. The function you need is encodeForHTML.

In any case, the behaviour you observe is not unexpected. With canonicalize, when ColdFusion sees &, it does its best to convert any HTML entity it finds. I would therefore expect it to convert such substrings as &gt, &lt, &amp and &nbsp respectively to >, <, & and the space character.

DeliKAuthor
Inspiring
July 18, 2018

@BKBK, I'm not outputting it.  I know to use EncodeFor when appropriate.  I was using it as a first check to see if the URL was valid, but it was failing on what appeared to be valid URL's. 

WolfShade
Legend
July 18, 2018

Have you reported it as a bug?

https://tracker.adobe.com

V/r,

^ _ ^

DeliKAuthor
Inspiring
July 18, 2018

Thanks, WolfShade, I had not reported it yet.  Will do so.

BKBK
Community Expert
Community Expert
July 18, 2018

Quite likely a bug indeed, the main reason being that canonicalize is evaluating the HTML entities even though they lack ";" at the end.

That said, I still cannot see the use-case for applying canonicalize to URLs. It implies the URL contains HTML entities.

With URLs the functions to use usually go in the other direction. That is, encodeForHTML, encodeForURL and URLEncodedFormat, which practically "uncanonicalize" the input.