Skip to main content
nikos101
Inspiring
March 31, 2009
Question

I would like to return the top 50 most popular words

  • March 31, 2009
  • 11 replies
  • 1652 views
HI, I have a text column in mssql, I would like to return the top 50 most popular words and their count(for all rows) excluding the words

i
a
it
the

Any clue how to do this?
This topic has been closed for replies.

11 replies

nikos101
nikos101Author
Inspiring
April 16, 2009

I done it with code like this, but it took about 200 seconds to finish and generated 300000 rows in WORDS

it would be cool if it could be done in pure sql, mabye it could be done using ms sql cursors

<cfset datasource = application.datasource>

<cfset datasource = application.datasource>
<cfquery name="q" datasource="#datasource#">
      
     select   * from    staff
      
</cfquery>
<cfloop query="q">
  <cfloop list="#comment#" index="word" delimiters=" ,?,.,!,;#chr(10)##chr(13)#">
    <cfquery name="q2" datasource="#datasource#">
          
             insert into WORDS values(
              <cfqueryparam value="#word#"     cfsqltype="cf_sql_varchar">  
             )
            </cfquery>
  </cfloop>
</cfloop>

nikos101
nikos101Author
Inspiring
April 3, 2009
I'm gonna loop through every row and shove each word into a temporary table, could someone post some example code to do this?
nikos101
nikos101Author
Inspiring
April 14, 2009

Bumped up

Inspiring
April 2, 2009
And it would probably be possible to tap a data-mining engine for
results like this. But you would need to have the engine first.

If you want to build your own data-mining engine you may want to consult
people who work in that world.

Your database management system probable has some data mining tools
built into it that you could use. But the source to find out about
those features would be the documentation of your database management
system.
nikos101
nikos101Author
Inspiring
April 2, 2009
Stuff like this is pretty common in data-mining though, would love to be able to do it
April 2, 2009
Only way I could think to do this would be to use full text index and some of the specific functionality associated with them like CONTAINS clause in SQL.

I'm not sure, if I understand, how you would count the instances of a set of words in a particular data cell. I can't think of ever doing something like that.
nikos101
nikos101Author
Inspiring
April 1, 2009
It would be really cool though if you could do it in pure T-SQL
Inspiring
April 1, 2009
nikos101 wrote:
> The trouble is that each row contains a number of words (on average 100-200)

Then the table is *not* what I imagined, thus why showing what you are
working with would have saved some effort.

I do not know of anything that I would like to run in CFML that do this.
Anything that I can imagine will be sloooow.

I would guess you need something that digests the data into other tables
in the database to do stuff like that efficiently.
nikos101
nikos101Author
Inspiring
April 1, 2009
The trouble is that each row contains a number of words (on average 100-200)
Inspiring
April 1, 2009
nikos101 wrote:
> OK,lets ignore the list of words for now :)
>

Are you familiar with count() and GROUP BY functionality?

Are you familiar with TOP N functionality?

With those two things, if your table is what I am imagining it to be, it
should be easy.
nikos101
nikos101Author
Inspiring
April 1, 2009
OK,lets ignore the list of words for now :)

As far as database schema it doesn't really matter, I am only querying one table which has a column of type text where the all the information that I'm interested in is stored.

regards