Skip to main content
Inspiring
November 17, 2009
Question

Pull Random Records, But No Duplicates

  • November 17, 2009
  • 3 replies
  • 675 views

Simple question.

I've got a query that pulls four random records, like so:

<CFQUERY NAME="my_query" DATASOURCE="#request.my_datasource#" DBTYPE=ODBC MAXROWS=4>
     SELECT  p.first_name, p.last_name

     FROM     people p, offices o, buildings b, cities c
     WHERE  p.people_id = o.people_id AND o.building_id = b.building_id AND b.city_id = c.city_id AND c.city_id = 12
     ORDER BY newid()
</CFQUERY>

Problem is, duplicates can occur if a the "offices" table has multiple instances of the same person, or "people_id".

"distinct" is out of the question, as it would have to be added to the SELECT list.

So, how can one generate random records, when "distinct" is not an option? I feel I got it to work with some GROUP BY trickery...

...but I'm not confident that's the best solution.

    This topic has been closed for replies.

    3 replies

    Participating Frequently
    November 17, 2009

    Possible query option:

    <CFQUERY NAME="my_query" DATASOURCE="#request.my_datasource#" DBTYPE=ODBC MAXROWS=4>
         SELECT DISTINCT TOP 4 first_name,last_name FROM (SELECT TOP 100 p.first_name, p.last_name

         FROM     people p, offices o, buildings b, cities c
         WHERE  p.people_id = o.people_id AND o.building_id = b.building_id AND b.city_id = c.city_id AND c.city_id = 12
         ORDER BY newid()) subquery
    </CFQUERY>

    The TOP 100 is going to minimize the subquery, but you can adjust this or remove it depending on the number of possible sub-offices, buildings, etc.

    Inspiring
    November 17, 2009

    Do a subquery on the people table to select the random four people, then join just those four to the other tables.

    --

    Adam

    Inspiring
    November 17, 2009

    Many db's have a way of selecting "only x" or "top x" records

    using only sql. That method is more efficient than what you are

    doing with the maxrows attribute.

    Yes. I cannot recall the behavior of the current drivers, but in some previous versions MAXROWS would retrieve all records. Then discard everything but the first X rows.

    Inspiring
    November 17, 2009

    -==cfSearching==- wrote:

    Many db's have a way of selecting "only x" or "top x" records

    using only sql. That method is more efficient than what you are

    doing with the maxrows attribute.

    Yes. I cannot recall the behavior of the current drivers, but in some previous versions MAXROWS would retrieve all records. Then discard everything but the first X rows.

    A more important consideration is that - irrespctive of what CF & the DB driver know to do about row-limiting - unless one tells the DB to only fetch n rows, it'll fetch the whole recordset.  Which is a large unnecessary overhead on the DB side of things.

    One should always row-limit one's queries in the SQL in addition to the MAXROWS attribute.

    --

    Adam

    Inspiring
    November 17, 2009

    Why is "distinct" out of the question?

    You can add min(b.building_id) to the select clause.  This will accomplish two things.  The group by clause will effectively be your "distinct" keyword, and you'll select fewer records.  How may records are you selecting anyhow to get the 4 you want?

    Many db's have a way of selecting "only x" or "top x" records using only sql.  That method is more efficient than what you are doing with the maxrows attribute.