Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Pull Random Records, But No Duplicates

Participant ,
Nov 16, 2009 Nov 16, 2009

Simple question.

I've got a query that pulls four random records, like so:

<CFQUERY NAME="my_query" DATASOURCE="#request.my_datasource#" DBTYPE=ODBC MAXROWS=4>
     SELECT  p.first_name, p.last_name

     FROM     people p, offices o, buildings b, cities c
     WHERE  p.people_id = o.people_id AND o.building_id = b.building_id AND b.city_id = c.city_id AND c.city_id = 12
     ORDER BY newid()
</CFQUERY>

Problem is, duplicates can occur if a the "offices" table has multiple instances of the same person, or "people_id".

"distinct" is out of the question, as it would have to be added to the SELECT list.

So, how can one generate random records, when "distinct" is not an option? I feel I got it to work with some GROUP BY trickery...

...but I'm not confident that's the best solution.

609
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 16, 2009 Nov 16, 2009

Why is "distinct" out of the question?

You can add min(b.building_id) to the select clause.  This will accomplish two things.  The group by clause will effectively be your "distinct" keyword, and you'll select fewer records.  How may records are you selecting anyhow to get the 4 you want?

Many db's have a way of selecting "only x" or "top x" records using only sql.  That method is more efficient than what you are doing with the maxrows attribute.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 17, 2009 Nov 17, 2009

Do a subquery on the people table to select the random four people, then join just those four to the other tables.

--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Nov 17, 2009 Nov 17, 2009

Many db's have a way of selecting "only x" or "top x" records

using only sql. That method is more efficient than what you are

doing with the maxrows attribute.

Yes. I cannot recall the behavior of the current drivers, but in some previous versions MAXROWS would retrieve all records. Then discard everything but the first X rows.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 17, 2009 Nov 17, 2009

-==cfSearching==- wrote:

Many db's have a way of selecting "only x" or "top x" records

using only sql. That method is more efficient than what you are

doing with the maxrows attribute.

Yes. I cannot recall the behavior of the current drivers, but in some previous versions MAXROWS would retrieve all records. Then discard everything but the first X rows.

A more important consideration is that - irrespctive of what CF & the DB driver know to do about row-limiting - unless one tells the DB to only fetch n rows, it'll fetch the whole recordset.  Which is a large unnecessary overhead on the DB side of things.

One should always row-limit one's queries in the SQL in addition to the MAXROWS attribute.

--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Nov 17, 2009 Nov 17, 2009

A more important consideration is that - irrespctive of

what CF & the DB driver know to do about row-limiting -

unless one tells the DB to only fetch n rows, it'll

fetch the whole recordset.  Which is a large unnecessary

overhead on the DB side of things.

Agreed. I was not suggesting one should rely on the driver behavior, but instead pointing out that it is not a reliable method. ie It does not necessarily limit the results in the way you might be thinking.  As both of you have said, it should always be done explicitly in sql.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 17, 2009 Nov 17, 2009
LATEST

Possible query option:

<CFQUERY NAME="my_query" DATASOURCE="#request.my_datasource#" DBTYPE=ODBC MAXROWS=4>
     SELECT DISTINCT TOP 4 first_name,last_name FROM (SELECT TOP 100 p.first_name, p.last_name

     FROM     people p, offices o, buildings b, cities c
     WHERE  p.people_id = o.people_id AND o.building_id = b.building_id AND b.city_id = c.city_id AND c.city_id = 12
     ORDER BY newid()) subquery
</CFQUERY>

The TOP 100 is going to minimize the subquery, but you can adjust this or remove it depending on the number of possible sub-offices, buildings, etc.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources