Copy link to clipboard
Copied
We've all built forms that use common reference data like country codes/names, states, etc. Historically I've always stored this type of information in a database that I would query as needed. However, some types of reference data may change once in awhile (e.g. zip/postal codes) and maintaining that data across apps can be a bit cumbersome.
Now that I've started thinking about apps from a modular HMVC approach (Coldbox fanboy here), I played with the idea of moving reference data into an external module that could be more easily updated and shared, both between apps and with the CF community (e.g. Forgebox).
To play around with this concept further, I created a generic module called "DataLib" that would store US States. The first thing I did was eliminate the database dependency, and instead, I stored all of the reference data as a .json file residing in the module. I then created a component that, when instantiated, would read the reference data and deserialize it for CFML use.
Here's an example of my first experiment:
Structure
/modules_app/dataLib/ModuleConfig.cfc // Coldbox module file
/modules_app/dataLib/models/UsStates.cfc // The CFC that is responsible for providing data
/modules_app/dataLib/data/usStates.json // the raw JSON file containing the data
The UsStates.cfc file might look something like this:
component
hint="I am the the US State data library"
{
/**
* Constructor
*/
function init() {
// read the data file into the private scope
variables.states = deSerializeJson( fileRead( expandPath( "/dataLib/data/usstates.json" ) ) );
return this;
}
/**
* list
* Returns the data
*/
struct function list() {
return variables.states;
}
/**
* listAsArray
* returns the data as an array
* todo: force the array to be alphabetical.
*/
array function listAsArray() {
var stateArray = [];
variables.states.each( function( key, value ) {
stateArray.append( { "name": value, "initials": key } );
} );
return stateArray;
}
I had a few thoughts about my approach:
1. I love the modularity and ability to store the actual data separately. When the US finally decides to conquer Canada, updating the json file would be extremely easy and it could be pushed to Github, Forgebox, or wherever.
2. I am concerned about the performance implications of using fileRead() to consume the JSON file during instantation. I could make the component a singleton, but for larger reference data (e.g. ZipCodes), I don't like the idea of all that data sitting in memory during the lifespan of the app. For smaller data sets like countries/states, keeping things in memory doesn't smell bad to me. Perhaps this method of storing reference data is ideal only for smaller data sets. Larger data sets like zip codes likely belong in the database, especially since you typically query a subset of data instead of the whole thing at once and databases are good at that type of thing.
3. I also like the idea of adding data manipulation methods onto the component. For example, requesting the data in array format, sorted, etc would be pretty easy to implement and provide useful utilities for consumption.
What do you think? How do you handle storing commonly used reference data? Can you see any other flaws in modulizing small reference data sets like I've done in my example?
Copy link to clipboard
Copied
I think it's up in the air about whether you're better off storing things in the database or somewhere else. I can see advantages and disadvantages for each. For example, you're going to have a hard time using "WHERE STATE = 'FL'" etc if you're not actually using those values from the database. And if you actually have a fully-functional database, you probably already have lots of lookup tables for states, countries, etc. So why build another approach that stores these things elsewhere just because you can? SQL is going to be around a lot longer than ColdBox or CFSCRIPT or CFML (or C# or whatever other programming language you can think of). But all that's just my opinion, so feel free to ignore it.
Beyond that, I wasn't initially sure why your UsStates.cfc would read from the JSON every time you use it. I figured that out a bit later reading the rest of your message. Anyway, you could avoid that pretty easily by just putting a lookup in your init to see if you've already read the list of states:
function init() {
if (!someLookupOrOther)
variables.states = deSerializeJson( fileRead( expandPath( "/dataLib/data/usstates.json" ) ) );
return this;
}
(Note: I don't know if you can have an if without a code block in CFSCRIPT and am too tired to find out.)
Of course, maybe this could be handled in the ColdBox module config instead, I have no idea about that either.
Finally, out of curiosity I did a lookup to find out how many zip codes are in the US. There are 41,692 of them, according to USPS. That is not a lot. I mean, it's a lot bigger than the number of states, but still it's not a lot. So I'd go ahead and load all of them up once in RAM and worry about it later.
Dave Watts, Eidolon LLC
Copy link to clipboard
Copied
Some zipcodes are shared between multiple cities. We import the $39.95 CSV data from zip-codes.com and have access to a lot of additional data (lat/lng, timezone, county, etc) and count 80,124 ZIP-to-City entries (including PO boxes). The lat/lng data is converted to a geography data type in MSSQL during the import process.
Regarding states and countries, I'm still using custom tags that have hard-coded ISO 2-letter and 3-letter (& numeric) codes to generate HTML "select" elements. I manually curated the data from multiple JSON files & Wiki pages. I also use a StateAbbreviation UDF that converts strings back-and-forth from 2, 3 and full values.
The only data that we store independent of the function is the extended zip code data that we license separately.
Copy link to clipboard
Copied
SQL is going to be around a lot longer than ColdBox or CFSCRIPT or CFML (or C# or whatever other programming language you can think of).
Dave Watts, Eidolon LLC
By @Dave Watts
I consider that to be one of the most perceptive observations in software development.
Copy link to clipboard
Copied
@Homestar9 , you have eloquently expressed what I think of the matter:
Use case:
By @Homestar9Thinking About the Best Ways to Retrieve/Store Common Reference Data (States, Countries, Zips, etc)
Solution:
...Moving reference data into an external module that could be more easily updated and shared ... between apps
By @Homestar9
Copy link to clipboard
Copied
@Homestar9 , you have eloquently expressed what I think of the matter:
Use case:
By @Homestar9Thinking About the Best Ways to Retrieve/Store Common Reference Data (States, Countries, Zips, etc)
Solution:
...Moving reference data into an external module that could be more easily updated and shared ... between apps
By @Homestar9
By @BKBK
I have thought about this some more. I would suggest the following as the "external module that could be more easily updated and shared ... between apps" :
The tasks you have listed are:
A relational database is specifically designed to perform each and every one of them. As such, the database is the Information Expert, to borrow the name from the G.R.A.S.P. principles of object-oriented design.
GRASP provides us with a best-practice solution. The Information Expert principle says you should assign responsibility to the software, module or class that has the information required to fulfill it.
With this new awareness, you can begin to see the shortcomings of your current setup:
By @Homestar9...some types of reference data may change once in awhile (e.g. zip/postal codes) and maintaining that data across apps can be a bit cumbersome.
That is a design that needs to be improved. There should be no need to maintain data across apps. An app should be unaware of a change in data. Best-practice: separation of concerns.
By @Homestar9... the idea of moving reference data into an external module that could be more easily updated and shared, both between apps and with the CF community
Someone thought about the same use-case long ago. The solution was: