Skip to main content
April 19, 2010
Question

Parsing regex from an ".ini" file

  • April 19, 2010
  • 2 replies
  • 2596 views

Hey folks,

My first post to CF fora here - not sure if this is any of a) appropriate spot or even b) appropriate question.


I have a routine that works but its awfully slow considering that it needs to be called at the beginning of every session - I'm parsing through a fairly large .ini file for browser detection from application.cfc - would appreciate any insights if anyone sees anything I'm just hosing up badly.  Obviously it mayn't be answerable to do idiosyncrasies of the browscap.ini file (that're tedious to go through).

<!--- 
  **********************************************************************
  Fetch user agent information
  **********************************************************************
  Important: This routine depends largely on keeping the browser
  capability file up-to-date.  Updates are currently available from:
  http://browsers.garykeith.com/stream.asp?BrowsCapINI
  ---------------------------------------------------------------------- --->
<cffunction name="getBrowserInfo">
  <cfscript>
    // Set location of browscap.ini ----------------------------
    browscap_ini = expandPath("./browscap.ini");
    // Read wildcard patterns from the INI file  ---------------
    browscap_list = getProfileSections(browscap_ini);
    // Seed some variables -------------------------------------
    browser_champion_pattern = "*";
    browser_champion_regex = "^.*$";
    default_id = "*";
  </cfscript>
  <cfloop list="#browscap_list[default_id]#" index="keyname">
    <cfscript>
      xvalue = getProfileString(browscap_ini, default_id, keyname);
      if (keyname neq "parent") { browscap['#keyname#'] = xvalue; }
    </cfscript>
  </cfloop>
  <cfscript>
    // Loop through the patterns to find the best match --------
    for (browscap.browser_name_pattern in browscap_list) {
      // Massage the wildcard into useable regex ---------------
      browscap.browser_name_regex = lCase(browscap.browser_name_pattern);
      browscap.browser_name_regex = replace(browscap.browser_name_regex, ".", "\.", "all");
      browscap.browser_name_regex = replace(browscap.browser_name_regex, "*", ".*", "all");
      browscap.browser_name_regex = replace(browscap.browser_name_regex, "?", ".", "all");
      browscap.browser_name_regex = replace(browscap.browser_name_regex, "(", "\(", "all");
      browscap.browser_name_regex = replace(browscap.browser_name_regex, ")", "\)", "all");
      browscap.browser_name_regex = replace(browscap.browser_name_regex, "[", "\[", "all");
      browscap.browser_name_regex = replace(browscap.browser_name_regex, "]", "\]", "all");
      if (right(browscap.browser_name_regex, 1) eq "*") {
        browscap.browser_name_regex = browscap.browser_name_regex & "$"; }
      browscap.browser_name_regex = "^" & browscap.browser_name_regex;
      // Test the resulting regex against the user agent -------
      if (isValid("regular_expression", lCase(CGI.HTTP_USER_AGENT), browscap.browser_name_regex)) {
        // User agent matches regex so we got a challenger -----
        if (len(browscap.browser_name_pattern) ge len(browser_champion_pattern)) {
          // If challenger is longer than champ then we got a new champ ----
          browser_champion_pattern = browscap.browser_name_pattern;
          browser_champion_regex = browscap.browser_name_regex; } } }
    // Set the winning regex patterns --------------------------
    browscap.browser_name_pattern = browser_champion_pattern;
    browscap.browser_name_regex = browser_champion_regex;
  </cfscript>
  <!--- Check for a living parent record ----------------------- --->
  <cfif (len(getProfileString(browscap_ini, browscap.browser_name_pattern, "parent")) gt 0)>
    <cfset parent_id = getProfileString(browscap_ini, browscap.browser_name_pattern, "parent")>
    <!--- Fetch the parental info ------------------------------ --->
    <cfloop list="#browscap_list[parent_id]#" index="keyname">
      <cfscript>
        xvalue = getProfileString(browscap_ini, parent_id, keyname);
        if (keyname neq "parent") { Session['agent_#keyname#'] = xvalue; }
      </cfscript>
    </cfloop>
  </cfif>
  <!--- Fetch the winning info --------------------------------- --->
  <cfloop list="#browscap_list[browser_champion_pattern]#" index="keyname">
    <cfscript>
      xvalue = getProfileString(browscap_ini, browser_champion_pattern, keyname);
      if (keyname neq "parent") { Session['agent_#keyname#'] = xvalue; }
    </cfscript>
  </cfloop>
</cffunction>
    This topic has been closed for replies.

    2 replies

    cfjedimaster
    Inspiring
    June 28, 2012

    There is a lot you could do to speed this up - specifically - reading in the INI file once and parsing it into a structure. That would remove the File IO and parsing time.

    ilssac
    Inspiring
    April 19, 2010

    I'm affraid I must say that this code is too dense for me to easily grok.

    But I the first thing I note is that you only showed the custom getBrowserInfo() function.

    How is this function getting called?

    I would want to make sure it is not getting called more often then necessary.

    Also are you reading and parsing the file every time it is called?  Is that necessary if so?  Could it be called less often and stroed in memory in some manner?  The would lessen the number of file I/O calls being used which usually is one of the more costly actions.

    April 19, 2010

    > I'm affraid I must say that this  code is too dense for me to easily grok.

    I agree.  The programmer is a bit dense too, but I'm told (by the resident female) that's natural for men like me so I really can't help it.  At least I have a ready-made excuse.

    >But I the first thing I note is that  you only showed the custom getBrowserInfo() function.

    >How is this  function getting called?

    It's a part of onRequestStart() in application.cfc and only gets called if the JSESSIONID cookie isn't set yet.


    Relevant code:

    <cffunction name="onRequestStart">

    [....]

    <cfif ((isDefined("Cookie.jsessionid") is False))>
        <!---
          ----------------------------------------------------------------
          Session Startup
          ----------------------------------------------------------------
          ---------------------------------------------------------------- --->
        <cfset StructDelete(Session, "jsessionid")>
        <cfinvoke method="getBrowserInfo">

    [....]

      </cfif>

    >I would want to make sure it is not getting  called more often then necessary.

    I've minimized its use; it really only gets called if user closes/re-opens browser (which is what I want) or if the session times out (which I'd love to stop but my hosting provider doesn't quite get the coldfusion versus jrun session management issue and won't set it up so I'm stuck using coldfusion session mgmt.).  My other programming language (PHP) has this function built-in and it's quite speedily unnoticable; I'm seriously considering d/l'ing the source code to see how they manage it - I'm still trying to figure if there's some way to avoid all that regex massaging and processing at least.

    >Also are you reading and parsing the file  every time it is called?  Is that necessary if so?  Could it be called  less often and stroed in memory in some >manner?  The would lessen the  number of file I/O calls being used which usually is one of the more  costly actions.

    Yup, hafta parse on each call; the routine basicly has a look at the CGI.HTTP_USER_AGENT variable and checks the browser's capabilities, along with giving me the actual browser software and version so I can determine what CSS, Javascript, etc. are going to work.  Unfortunately the sheer number of available browser software agents (and versions) is pretty off-the-hook (especially when you include cell phone browsers and goodness knows how many other different platforms) so I'm really considering just abbreviating the .ini file in some way so that the users gets IE/Firefox/Safari with "features" and everyone else is going to have to deal with a "vanilla" version.

    ilssac
    Inspiring
    April 19, 2010

    Mike Given wrote:

    Yup, hafta parse on each call; the routine basicly has a look at the CGI.HTTP_USER_AGENT variable and checks the browser's capabilities

    Yes, but does the INI file need to be loaded and parsed for the regex every time the CGI.HTTP_USER_AGENT is checked?  Personnaly, I would probably look at a caching idea where the INI file is loaded and parsed into it's set of regex tests and store this in memory.  Then the CGI.HTTP_USER_AGENT could be checked against the data in memory, instead of getting it from the file system.

    Secondly.  The idea would be to see if you need to check against every regex every time?  Is there some place where you can short cut the looping once you have determined what you want to determine from the checking?