Skip to main content
Inspiring
January 31, 2023
Answered

Results change using SQL vs CF query?

  • January 31, 2023
  • 2 replies
  • 649 views

Hello

I'm using a query to match first & last names from a main table (ged_main-3543 records) on a lookup table (marier_main- 264 records).

 

When I use this query, I get 25 matches:

<cfquery name="match_ged_fam" datasource="#Request.BaseDSN#">
SELECT ged_lastname, ged_firstname, ged_datebirth, ged_placebirth, ged_datedeath, ged_placedeath, marier_lastname, marier_firstname
FROM ged_main, marier_main
WHERE ged_lastname LIKE marier_lastname
AND marier_firstname LIKE ged_firstname
</cfquery>

 

When I use this query, I get 67 matches:

 

<cfquery name="match_fam_name" datasource="#Request.BaseDSN#">
SELECT ged_lastname, ged_firstname, ged_datebirth, ged_placebirth, ged_datedeath, ged_placedeath
FROM ged_main t1
WHERE exists
(
SELECT marier_lastname, marier_firstname
FROM marier_main t2
WHERE CHARINDEX(t2.marier_lastname,t1.ged_lastname)>0
AND CHARINDEX(t2.marier_firstname,t1.ged_firstname)>0
)

</cfquery>

 

I am trying to figure out why the results are not the same, and which query is the most acurate?

 

Any help would be appreciated. Thanks,

 

Norman B.

 

    This topic has been closed for replies.
    Correct answer Dave Watts

    Dave:

    I'm sure the way the DB was originally structured needs help. As you can see, the list of women's (in english, their group literally translated means "marriageable girls") names table needs birthdate, birthplace, deathplace, datedeath.

     

    Because this group is well-documented, including their "dit" names at the time, this table would be indisputable data. They would have inherited their dit names from their fathers. The dit names should really be a separate column.

    Other flies in the ointment include that the main table containing 3500 names (soon to be 4500++) does not hold all the data for a large number of indiveduals.

    There is no way around this fact. The best additional key element for this group is that they mostly share a birth date range. They must have come over at marriageable age (12 thru 45).

    I know, but this was 1634.

    Yet another issue is that there are probably 20 or more "Marie Hebert"s in the main table, a very popular name. If they had no dit names but were born in that date range, it would not be possible ditinguish one from the other (they very well have not been in that group at all).

    Additionally, one could have more than one dit name - sometimes 4!

    If these were all formated the same as comma or slash delimted, this could work.

    Marie Pesselet dit Pesseley/Petitpas is an example.

    I have a wys to go on this project.

    Thanks again,

     

     


    Well given that the data is from 1634, we should probably be happy it's as good as it is! That said, I would try to restructure the data by breaking dit names into a different table, using any potential separator that's not actually part of the dit name itself. I'm actually not sure how important the dit names are for your project, so maybe that'll just clean up the main dataset a bit. If you can get an ETL tool like (ugh) Talend Data Studio, you can probably automate this without writing a lot of code. There's a free version of that, but I warn you it's pretty complex by itself and throws the worst error messages. THE WORST. Otherwise, you can either automate it in T-SQL or just do it manually - three or four thousand rows isn't all that much.

     

    Dave Watts, Eidolon LLC

    2 replies

    Community Expert
    January 31, 2023

    The first thing I'd recommend is just running these queries from other SQL clients just to make sure they work the same there. (They will.) That way, you can focus on SQL instead of CF.

     

    The second thing is, these queries seem like odd ways to compare what (presumably?) are identical strings. I've never seen the LIKE operator used without wildcard operators. Can't you do something like this?

     

    WHERE UCASE(LCASE(TRIM(ged_lastname))) = UCASE(LCASE(TRIM(marier_lastname)))

     

    And why are these last name values different at all, out of curiosity?

     

    I'm sorry, your question raised more questions on my part than answers!

     

    Dave Watts, Eidolon LLC

    Dave Watts, Eidolon LLC
    Inspiring
    January 31, 2023

    Dave:

    Thanks for your reply and help with this.

    Between 1634-1662, there were 264 women who were shipped to New France (now Quebec) to marry the pioneers (TMI!). Their names are on ship manifests etc.

     

    There are 3500+ ancestors in the main table.

    Just looking to match which ancestors were on that list.

    Also, the French had "dit" names - e.g. Jacquette Borde dit Desbordes is on that list.

    I'm not looking to match dit names, just first & last.

    Hope that clears it up?

    Norman B.

    Inspiring
    January 31, 2023

    Sorry- I know they are both CF queries, just differently structured.