Fun With IsDefined() vs KeyExists()

Report · Sep 07, 2021

I write a lot of apps that interact with external APIs. Often times the data returned by 3rd parties is a deeply nested structure of data - and occasionally the 3rd party isn't reliable enough to send consistent response structures so I need to check for the existence of structure keys to avoid errors (frustrating, I know).

Here's an example of what several response objects might look like from the same API

{
  // Response sample 1:
  apiResponse: {
    data: {
      errors: [ "surname is invalid" ]
    }
  }

  // Response sample 2:
  apiResponse: {
    error: "The system is currently offline"
  }

  // Response sample 3:
  apiResponse: {
    data: {
      name: [ "john smith" ]
    }
  }
}

When I digest the data, I check the `apiResponse` for the existence of keys and sub keys before processing. However, it can be pretty tedious from a coding perspective to write out expressions that check for the existence of every single sub-key like this:

if ( 
 arguments.apiResponse.keyExists( "data" ) && 
 arguments.apiResponse.data.keyExists( "errors" )
) {
  // ... do something
}

I wanted to simplify the code, and unfortunately `structKeyExists()` does not support nested keys.
My first instinct was to use the 'isDefined()` function to check for the existence of the full struct key path. However, the `isDefined()` function has been mostly vilified by the CFML community for performance/security reasons (related discussion related StackOverflow)

Here's the same `if` statement from above using `isDefined()` instead:

if ( isDefined( "arguments.apiResponse.data.errors" ) ) {
  // ... do something
}

Much cleaner, right? Howver, what about the performance implications? `isDefined()` by design will check various scopes to see if the variable exists, which can be slow.

I wrote a small UDF that I theorized would be more efficient than `isDefined()` and could allow me to dynamically check a struct for the existence of a key. Here's what I came up with:

boolean function structHasKey( required struct struct, required string key ) {

    var keyArray = listToArray( arguments.key, "." );
    var subStruct = arguments.struct; 
    
    for ( var item in keyArray ) {
        if ( !subStruct.keyExists( item ) ) {
            return false;
        }
        subStruct = subStruct[ item ];
    }
    
    return true;

}

With this new UDF, we can make the same check as above like this:

if ( structHasKey( arguments.apiResponse, "data.errors" ) ) {
  // ... do something
}

I wrote a simple benchmark to see how this UDF stacks up against `isDefined()` and the traditional approach of using `structKeyExists()`. Now, I realize TryCF isn't the most scientifically appropriate benchmarking tool, but it's interesting to see how the various CFML engines (Adobe/Lucee) handle the different approaches. I also believe `isDefined()` must perform differently based on the number of variables present in the URL/FORM (and other) scopes that it checks.

After running the TryCF gist 20-30 times, the fastest method was the old-school `structKeyExists()`. Both the UDF method and `isDefined()` traded between 2nd and 3rd place quite often - especially depending on the CF engine so I wasn't able to make a final consensus. If I were a betting man, i would have thought my UDF was going to beat `isDefined()` every time, but the overhead of executing the UDF must outweigh any benefit of the approach.

If anyone has any tips on improving the UDF, or my test setup, let me know as I'd love to play with this concept more.

Report · Sep 08, 2021

Interesting piece, with sound judgement and instructive remarks.

However, I am a bit confused by the logic in the user-defined function. Particularly with regard to how you apply the function in the " simple benchmark ".

Take the code:

<cfscript>
myStruct = {
    apiResponse = {
        body = "i am the body",
        headers = {
            status_code = "405",
            status = {
                code: "200"
            }
        }
    }
};
keysToTest = [
    "apiResponse.headers.status_code", // true
    "apiResponse.headers.error", // false
    "apiResponse.headers.status.code", // true
    "apiResponse.headers.status.name" // false
];
for ( key in keysToTest ) {
	result = structHasKey( myStruct, key );
}
</cfscript>

The first structHasKey call is: structHasKey(myStruct, "apiResponse.headers.status_code")

Within the function, the key is split up into its constituent items, apiResponse, headers and status_code.

The function then tests whether each of the these items is a key of myStruct.

Why?

The three items are so far down the line that they have no visibility at the level of the root struct.

Report · Sep 08, 2021

In any case, your results confirm what I I think. StructKeyExists/keyExists is the way to go.

The task here is to identify existing, missing or invalid struct-key pairs. This implies that you must have a reference-definition to start with. Much like the Document Type Definition (DTD) or XML Schema Definition (XSD) needed to validate the elements within an XML.

You therefore have to find a general way to parse tree structure, which usually involves recursion. Recursion in turn comes with loop upon loop upon loop. Hence, higher execution time. Not forgetting code complexity.

One way to simplify the problem is to flatten the tree. The explanation follows.

Let's assume your original reference-definition for struct-key pairs is:

rootStruct[apiResponse]
apiResponse[body, headers]
body[]
headers[status, status_code, error]
status[code, name]
code[]
name[]
status_code[]
error[]

You can flatten the tree into the following list of struct-key pairs:

rootStruct-apiResponse
apiResponse-body
apiResponse-headers
headers-status
headers-status_code
headers-error
status-code
status-name

With such a reference-definition to start with, all you have to do is check for the existence of each, or of any, of 8 struct-key pairs. And you're done. 🙂

Report · Sep 08, 2021

I mentioned how recursion (tree traversal) is complex. As an alternative, you could just let ColdFusion do all the heavy-lifting for you. For example, by using structFindKey.

<cfscript>
void function checkWhetherKeyInStruct (required struct rootStructure, required string key) {
	var keyDetailsArray=structFindKey(arguments.rootStructure, arguments.key, "all");
	if (keyDetailsArray.len()==0) {
		writeoutput("Key " & "'#key#'" & " doesn't exist." & "<br>");
	} else {
		writedump(keyDetailsArray);
	}
}

myStruct = {
    apiResponse = {
        body = "i am the body",
        headers = {
            status_code = "405",
            status = {
                code: "200"
            }
        }
    }
};

keysToCheck=["code","error","status_code","body","name","headers","apiResponse"];

for (key in keysToCheck) {
	checkWhetherKeyInStruct(myStruct, key);
}

</cfscript>

Report · Sep 08, 2021

@Homestar9 You didn't indicate which version of Adobe ColdFusion you were using. I'm still using CF2016 and have been writing similar extraneous logic when attempting to determine if a deeply nested path is valid or not.

Here's my solution. I was wondering if you could review and test it with your existing internal demo to determine how well it performs. This UDF could be easily modified to return the resultant key's value if it exists and fallback to returning NULL or a empty string if it doesn't. Unfortunately, this solution only works with Adobe ColdFusion as Lucee CFML does not bundle the "org.apache.commons.beanutils" library. (I'd love to see a Lucee-compatible version.)

https://gist.github.com/JamoCA/25dc0b3133b0d9a890979ccf47e321e4
LEGAL NOTE: I'm not posting any CFML source code in this forum due to the Adobe Community Terms regarding copyright.

Report · Sep 09, 2021

@Homestar9 You didn't indicate which version of Adobe ColdFusion you were using. I'm still using CF2016 and have been writing similar extraneous logic when attempting to determine if a deeply nested path is valid or not.

Here's my solution. I was wondering if you could review and test it with your existing internal demo to determine how well it performs.

By @James Moberg

What if you used structFindKey or, perhaps better, the safe-navigation that @bradwood.com mentioned?

Report · Sep 09, 2021

@BKBK I believe that safe navigation could be used 1) if the key path you are attempting to verify remains static and 2) is explicitly coded in the CFML. I don't see any way of using that syntax with a random amount of dynamic key names. I hope that I'm wrong and someone can provide me an example where a safe navigation operation can be dynamically generated. The only way I can think of is to write a generator script to write a safe nagivation operation, save it as CFML file, process it and return the result. If compared, this approach is defintely less performant.

I continued to have some more fun (without losing any eyes) and came up with a Lucee-friendly and array-compatible solution that loops over the results of structFindKey() to determine if a single-string path of keys (including arrays) exists or not. (ie, "data.errors[].errorcode").

https://gist.github.com/JamoCA/4fcb3a8c691f29199d23c02941bece11

You could choose to manually hard-code all of the logic in advance, but this UDF proof-of-concept provides the convenience of being able to dynamically accept a single string (representing a full key path w/optional array notation), compare it against a struct and validate whether the path exists or not. You could iterate over hundreds of different key paths to test without being required to manually pre-write any safe navigation operations in advance.

Report · Sep 08, 2021

You're creating a much too complex solution for a very simple problem. Just use isDefined(). Done. The security implications of that function only apply if you are passing untrusted dynamic variable names into it, which doesn't seem to be the case. And I think the peformance overhead has been greatly overstated by the community. CF's scope hunting behavior still kicks in all the same for code like

structKeyExists( foo, "bar" )

when it resolves what scope the "foo" varaible exists in. The rule of thumb is, unless you're seeing measurable slowness, youi're creating premature optimization. And you're likely to see any measureable slowness unless you've got code like this executing thousands of times in a loop. I would assume isDefined() optimizes scope lookup for "variables.foo" in the same manner that structKeyExists() does.

If you really want a method with less ambiguity, then this is precisely what the save navigation was added to the language for.

apiResponse = {
    data: {
      name: [ "john smith" ]
    }
  }

// access safeley, returning null if not exists
writeDump( apiResponse?.data?.name )
// access safeley, returning default value if not exists
writeDump( apiResponse?.data?.foobar ?: 'default' )
// safely check for existence of deep key
writeDump( isNull( apiResponse?.data?.foobar ) )

Again, this is all built in-- no need for complex custom UDFs to accomplish this.

Report · Sep 08, 2021

This is one of those areas where there's a performance advantage but it's so small it might as well not exist. I used to do a lot of code reviews, and it never turned out that this was what made code good or bad. There was ALWAYS some poorly-thought-out database interaction where all the time actually went. Meanwhile, people are trying these weird single-threaded tests with TryCF or whatever that are nowhere near what a real concurrent environment is like. Saying you used that for testing is saying that you didn't do any realistic testing. Not that there's anything wrong with noodling around, but that test isn't going to tell you anything useful and you shouldn't use it as if it were.

As for the original question, this kind of thing is why we have try/catch. What would you do in your case if any of those nested fields were missing? Do you have a way to send a very explicit message back to the remote API saying "this time, please include field X"? I'm guessing no, so just try that API call and fail if it brings back invalid data.

I'm not trying to be difficult here, I just see a lot of people chasing after the wrong things sometimes.

Dave Watts, Eidolon LLC

Report · Sep 08, 2021

The title of this post starts with "Fun with...".

With that in mind, I wanted to see what was possible and to determine if I could figure out how to handle multiple, unknown keys and return a true/false response without having to loop over each key. (I'm guessing that this is similar to what the ACF-added "beanutil" library does.) I initially researched using the safe navigation operators (which I've been using), but couldn't figure out how to dynamically generate the "?" statement using a list of keys.

While this approach may be considered complex, it's also "dynamic". If used in a front-end with user-submitted API data and the user specified the key paths to extract data, I don't believe that safe navigation operators could be used unless the logic was written as a static CFML statement. Am I right regarding this or is there a method to safely evaluate multiple, dynamicly-named keys?

Report · Sep 09, 2021

It's all fun and games until someone loses an eye!

More seriously, fun with CF shouldn't lead you to the conclusion that (a) the speed difference between any two built-in CF functions is significant in the real world, (b) using your own function will be faster than any built-in function, or (c) you should try to use conditional logic instead of try/catch to identify all possible states within a program when you can't repair a defective state.

Dave Watts, Eidolon LLC

Report · Sep 09, 2021

If you really want a method with less ambiguity, then this is precisely what the save navigation was
added to the language for.

By @bradwood.com

Ah, of course!

Report · Sep 09, 2021

@Homestar9 , lots of food for thought there.

Nevertheless, a remark on your original topic: the performance of isDefined versus that of structKeyExists. In my opinion, it is a legitimate question to ask. I also consider your method OK (comparing function performance within an identical environment). At least, to get a rough statistical guide or rule-of-thumb.

For example, the following test yielded:

IsDefined execution time = 50022 ms

StructkeyExists execution time = 50024 ms

KeyExists execution time = 50038 ms

<cfscript>
isIt=false;
structure=structnew();

t1=getTickCount();
for (i=1; i lte 500000; i++) {
	structure={nestedStructure:{key:createUUID()}};
	isIt=isDefined("structure.nestedStructure.key")
}
t2=getTickCount();

for (i=1; i lte 500000; i++) {
	structure={nestedStructure:{key:createUUID()}};
	isIt=structkeyExists(structure.nestedStructure, "key")
}
t3=getTickCount();


for (i=1; i lte 500000; i++) {
	structure={nestedStructure:{key:createUUID()}};
	isIt=structure.nestedStructure.keyExists("key")
}
t4=getTickCount();

writeoutput("<p>IsDefined execution time = " & t2-t1 & " ms </p>");
writeoutput("<p>StructkeyExists execution time = " & t3-t2 & " ms </p>");
writeoutput("<p>KeyExists execution time = " & t4-t3 & " ms </p>");
</cfscript>

The execution times are roughly the same, confirming the view some have expressed that there is not much to choose from, in performance terms, between isDefined and structKeyExists. However, if there had been a difference of one or more orders of magnitude between the results, then that would have been something to write home about.

That said, I would heed the useful advice everyone has given.

Report · Sep 15, 2021

Now see this is why I love the CFML community! I thoroughly enjoyed reading everyone's take on my code experiment and many of the responses gave me a lot to think about. Thanks to everyone who participated.

If you had presented me with this problem 2 years ago I would have gone with the more performant `keyExists()` method even if it meant creating `if` statements a mile long. However, nowadays I prioritize code readability (and simplicity) since we have to share our code with others so often - whether it's online or with colleagues. Therefore, in my actual app, I switched to the "isDefined()" solution because it does what I want in a single line and any developer will understand my intent immediately.

@bradwood.com introduced the safe navigation possibility which I hadn't considered before. I actually attempted to update the sample benchmark to see how it would stack up, but I must be implementing it incorrectly because I couldn't figure out the best way to use safe navigation to detect if a particular nested key exists or not. Here's the example I tested and couldn't get it to work:

// the root struct we will test against
myStruct = {
    apiResponse = {
        body = "i am the body",
        headers = {
            status_code = "405",
            status = {
                code: "200"
            }
        }
    }
};

writeDump( myStruct?.apiResponse?.headers?.status_code ); // 405
writeDump( isNull( myStruct?.apiResponse?.headers?.status_code ) ); // true???

If anyone has any tips regarding a way to use safe navigation in this scenario to test for the existence of a key, please feel free to chime in. Also, I have no idea why the `isNull()` method returns `true` when a key value actually exists.

Report · Sep 15, 2021

@Homestar9 , You're right to question this. It's a bug on CF2016 and CF2018, I'm afraid. I have reported it.

See

https://tracker.adobe.com/#/view/CF-4212382

https://tracker.adobe.com/#/view/CF-4204063

These bug reports also contain a workaround: store the value in a variable, then call isNull on it. Something like

x = myStruct?.apiResponse?.headers?.status_code; 

writeDump( myStruct?.apiResponse?.headers?.status_code ); // 405
writeDump( isNull( x ) );