Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Comparing fields

LEGEND ,
Sep 12, 2006 Sep 12, 2006
I have a job that will involve comparing data in an uploaded file to a
master data set, looking for indenties.

The incoming file will be a numeric only CSV, and may contain 20,000
records, each with a single large integer field. I want to a) eliminate
dupes from that incoming data, and then compare the de-duped list to the
master list to calculate % overlap, or identity.

I'm thinking I'd read the incoming data into a table, then select unique
from the table, and begin a brute force comparison, field by field, scanning
each field against the master until I reach a master value that is larger
than the test value, and then moving on to the next test record.

Is there a SQL way of doing this more efficiently?

--
Murray --- ICQ 71997575
Adobe Community Expert
(If you *MUST* email me, don't LAUGH when you do so!)
==================
http://www.dreamweavermx-templates.com - Template Triage!
http://www.projectseven.com/go - DW FAQs, Tutorials & Resources
http://www.dwfaq.com - DW FAQs, Tutorials & Resources
http://www.macromedia.com/support/search/ - Macromedia (MM) Technotes
==================



TOPICS
Server side applications
453
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 13, 2006 Sep 13, 2006
You could certainly code sql to check for matching fields from both the tables and then do a calculation based on the results of this if that's any help.
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 13, 2006 Sep 13, 2006
Can you make any substantive recommendations or any comments regarding my
scheme outlined in the post?

I'm really worried about comparing a list of 20K records to a master of 20K
records in real time, especially when someone else may hit the server at the
same time with a similar query.

Any comments?

--
Murray --- ICQ 71997575
Adobe Community Expert
(If you *MUST* email me, don't LAUGH when you do so!)
==================
http://www.dreamweavermx-templates.com - Template Triage!
http://www.projectseven.com/go - DW FAQs, Tutorials & Resources
http://www.dwfaq.com - DW FAQs, Tutorials & Resources
http://www.macromedia.com/support/search/ - Macromedia (MM) Technotes
==================


"i_hitman" <webforumsuser@macromedia.com> wrote in message
news:ee8jca$mjj$1@forums.macromedia.com...
> You could certainly code sql to check for matching fields from both the
> tables and then do a calculation based on the results of this if that's
> any help.


Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 13, 2006 Sep 13, 2006
Is this SQL Server? How many fields are you comparing to in the master
table? If only one, I would import the data, eliminate duplicates, then
compare to the data field in the master table. I usually eliminate dupes
using a temp table, although there are many other ways to do it:

select distinct yourfield into #holding from yourtable
truncate table yourtable
insert yourtable select yourfield from #holding
drop table #holding

Then get data that does not match the master:

SELECT yourfield FROM yourtable WHERE yourfield NOT IN (SELECT yourfield
from mastertable)

That gives you all data not in the master table. If you want it inserted,
write an insert statement:

INSERT master (yourfield)
SELECT yourfield FROM yourtable WHERE yourfield NOT IN (SELECT yourfield
from mastertable)

It could all be done in one query, but this gives you the inbetween steps
and tables. 20K records is nothing for SQL.

For percentage overlap, get counts of each:

SELECT COUNT (*) as NewRecordCount FROM yourtable
SELECT COUNT(*) as Matches FROM mastertable WHERE yourfield IN (select
yourfield from yourtable)
SELECT COUNT(*) as NonMatchesMaster FROM mastertable WHERE yourfield NOT IN
(select yourfield from yourtable)
SELECT COUNT(*) as NonMatchesNew FROM yourtable WHERE yourfield NOT IN
(select yourfield from mastertable)

Hope that helps.
--
Tom Muck
co-author Dreamweaver MX 2004: The Complete Reference
http://www.tom-muck.com/

Cartweaver Development Team
http://www.cartweaver.com

Extending Knowledge Daily
http://www.communitymx.com/
"Murray *ACE*" <forums@HAHAgreat-web-sights.com> wrote in message
news:ee7jud$gqh$1@forums.macromedia.com...
>I have a job that will involve comparing data in an uploaded file to a
>master data set, looking for indenties.
>
> The incoming file will be a numeric only CSV, and may contain 20,000
> records, each with a single large integer field. I want to a) eliminate
> dupes from that incoming data, and then compare the de-duped list to the
> master list to calculate % overlap, or identity.
>
> I'm thinking I'd read the incoming data into a table, then select unique
> from the table, and begin a brute force comparison, field by field,
> scanning each field against the master until I reach a master value that
> is larger than the test value, and then moving on to the next test record.
>
> Is there a SQL way of doing this more efficiently?
>
> --
> Murray --- ICQ 71997575
> Adobe Community Expert
> (If you *MUST* email me, don't LAUGH when you do so!)
> ==================
> http://www.dreamweavermx-templates.com - Template Triage!
> http://www.projectseven.com/go - DW FAQs, Tutorials & Resources
> http://www.dwfaq.com - DW FAQs, Tutorials & Resources
> http://www.macromedia.com/support/search/ - Macromedia (MM) Technotes
> ==================
>
>
>


Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 13, 2006 Sep 13, 2006
> Is this SQL Server?

MySQL/PHP

> How many fields are you comparing to in the master table?

Only one field in both master and test tables. Potentially 22,000 records
in master and perhaps double that (not de-duped) in the test.

> Hope that helps.

I would say it has solved the problem! I wanted substantive, and I got a
solution. Thanks, Tom.

When you say 'nothing for SQL' do you mean milliseconds?

--
Murray --- ICQ 71997575
Adobe Community Expert
(If you *MUST* email me, don't LAUGH when you do so!)
==================
http://www.dreamweavermx-templates.com - Template Triage!
http://www.projectseven.com/go - DW FAQs, Tutorials & Resources
http://www.dwfaq.com - DW FAQs, Tutorials & Resources
http://www.macromedia.com/support/search/ - Macromedia (MM) Technotes
==================


"Tom Muck" <tommuck@NOSPAM.basic-drumbeat.com> wrote in message
news:ee97kg$hm5$1@forums.macromedia.com...
> Is this SQL Server? How many fields are you comparing to in the master
> table? If only one, I would import the data, eliminate duplicates, then
> compare to the data field in the master table. I usually eliminate dupes
> using a temp table, although there are many other ways to do it:
>
> select distinct yourfield into #holding from yourtable
> truncate table yourtable
> insert yourtable select yourfield from #holding
> drop table #holding
>
> Then get data that does not match the master:
>
> SELECT yourfield FROM yourtable WHERE yourfield NOT IN (SELECT yourfield
> from mastertable)
>
> That gives you all data not in the master table. If you want it inserted,
> write an insert statement:
>
> INSERT master (yourfield)
> SELECT yourfield FROM yourtable WHERE yourfield NOT IN (SELECT yourfield
> from mastertable)
>
> It could all be done in one query, but this gives you the inbetween steps
> and tables. 20K records is nothing for SQL.
>
> For percentage overlap, get counts of each:
>
> SELECT COUNT (*) as NewRecordCount FROM yourtable
> SELECT COUNT(*) as Matches FROM mastertable WHERE yourfield IN (select
> yourfield from yourtable)
> SELECT COUNT(*) as NonMatchesMaster FROM mastertable WHERE yourfield NOT
> IN (select yourfield from yourtable)
> SELECT COUNT(*) as NonMatchesNew FROM yourtable WHERE yourfield NOT IN
> (select yourfield from mastertable)
>
> Hope that helps.
> --
> Tom Muck
> co-author Dreamweaver MX 2004: The Complete Reference
> http://www.tom-muck.com/
>
> Cartweaver Development Team
> http://www.cartweaver.com
>
> Extending Knowledge Daily
> http://www.communitymx.com/
> "Murray *ACE*" <forums@HAHAgreat-web-sights.com> wrote in message
> news:ee7jud$gqh$1@forums.macromedia.com...
>>I have a job that will involve comparing data in an uploaded file to a
>>master data set, looking for indenties.
>>
>> The incoming file will be a numeric only CSV, and may contain 20,000
>> records, each with a single large integer field. I want to a) eliminate
>> dupes from that incoming data, and then compare the de-duped list to the
>> master list to calculate % overlap, or identity.
>>
>> I'm thinking I'd read the incoming data into a table, then select unique
>> from the table, and begin a brute force comparison, field by field,
>> scanning each field against the master until I reach a master value that
>> is larger than the test value, and then moving on to the next test
>> record.
>>
>> Is there a SQL way of doing this more efficiently?
>>
>> --
>> Murray --- ICQ 71997575
>> Adobe Community Expert
>> (If you *MUST* email me, don't LAUGH when you do so!)
>> ==================
>> http://www.dreamweavermx-templates.com - Template Triage!
>> http://www.projectseven.com/go - DW FAQs, Tutorials & Resources
>> http://www.dwfaq.com - DW FAQs, Tutorials & Resources
>> http://www.macromedia.com/support/search/ - Macromedia (MM) Technotes
>> ==================
>>
>>
>>
>
>


Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 13, 2006 Sep 13, 2006

"Murray *ACE*" <forums@HAHAgreat-web-sights.com> wrote in message
news:ee9a6j$koa$1@forums.macromedia.com...
>> Is this SQL Server?
>
> MySQL/PHP

Mysql code may be a little different, but similar.

> I would say it has solved the problem! I wanted substantive, and I got a
> solution. Thanks, Tom.

No prob.

> When you say 'nothing for SQL' do you mean milliseconds?

Should be milliseconds. If you were talking millions of rows, I would be
concerned, but not tremendously.

Tom


Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 13, 2006 Sep 13, 2006
LATEST
Thanks, Tom. This really sets my mind at ease....

--
Murray --- ICQ 71997575
Adobe Community Expert
(If you *MUST* email me, don't LAUGH when you do so!)
==================
http://www.dreamweavermx-templates.com - Template Triage!
http://www.projectseven.com/go - DW FAQs, Tutorials & Resources
http://www.dwfaq.com - DW FAQs, Tutorials & Resources
http://www.macromedia.com/support/search/ - Macromedia (MM) Technotes
==================


"Tom Muck" <tommuck@NOSPAM.basic-drumbeat.com> wrote in message
news:ee9b5k$lv5$1@forums.macromedia.com...
>
> "Murray *ACE*" <forums@HAHAgreat-web-sights.com> wrote in message
> news:ee9a6j$koa$1@forums.macromedia.com...
>>> Is this SQL Server?
>>
>> MySQL/PHP
>
> Mysql code may be a little different, but similar.
>
>> I would say it has solved the problem! I wanted substantive, and I got a
>> solution. Thanks, Tom.
>
> No prob.
>
>> When you say 'nothing for SQL' do you mean milliseconds?
>
> Should be milliseconds. If you were talking millions of rows, I would be
> concerned, but not tremendously.
>
> Tom
>


Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines