Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Dynamically search for same digits

Guest
Aug 23, 2011 Aug 23, 2011

I need help with the most efficient way to search for same digits in ids dynamically.

After I queried my table: select MyId from Mytable then I need to check if none of these ids consist of the same repeating digits such as 0000,1111,2222,......9999

To make it more complicated, MyId doesn't have a fix length, it can be 4 digits length or 5, 6 all the way to 9. So the search of the same digits it this ids also should also correspond to the length of the ids.

If MyId is 4 digit long, I should make sure none of the MyId contains 0000,1111,2222,33333,...all the way to 9999

Can anyone give me example on how to do this efficienly since I may deal with large records.

TOPICS
Getting started
2.0K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 23, 2011 Aug 23, 2011

Unless I misunderstand the question,  just do this:

where MyID <> 9999

or whatever number you don't want to match.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 23, 2011 Aug 23, 2011

Dan,

I can't do that since I don't know the length of MyId. If I know MyId is always 4 digits length, then your suggestion is working

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Aug 23, 2011 Aug 23, 2011

Questions:

1. What database server, vendor and version, are you using?

2. What is the data type of the MyId field?

If your database server supports using regular expressions in the WHERE clause you might use a pattern like (\d)\1+ to search for repeating digits.

See: http://www.regular-expressions.info/charclass.html

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 24, 2011 Aug 24, 2011

It's Sybase, I think version 4.0 (?) and the datatype for MyId is varchar because it can accept alpha numeric. But we don't bother checking MyId other than 1111,2222,3333, all the way to 9999 if MyId happen to be 4 char length. Or 11111111,22222222,33333333, etc if it happens to be 8 char length.

Within 1 batch, when MyId is 4 char length, all will be 4 char length.

In tomorrow's batch, MyId can be 6 char length and next month it may be 4, 6 char length or 7 char lenght. Max char length for MyId is 9 char length.

So, within 1 batch there will be no mixing of MyId's char length.

I'm thinking of doing Len(MyId) first to find out the lenght of MyId for this batch, then looping to create 111,222,333,etc for the length determined by Len(MyId) and before moving to the next same digit, I do select MyId where MyId Like '%1111....%', then if recordcount is not 0 do something, then move to the next same digit and so on.

But I'm not sure how to write that efficiently when processing 5000 or more records for example.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 24, 2011 Aug 24, 2011

Regarding

But I'm not sure how to write that efficiently when processing 5000 or more records for example.

Consider loading every record into a working table and processing them with sql.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Aug 24, 2011 Aug 24, 2011

I advise that you verify the version of Sybase you are using.  Version 4 sounds very old and is likely incorrect.

You might follow Dan Bracuk's advise.  Load your data into a SQL table and query the data there.  It looks like Sybase (at least version 11) supports regular expressions in queries which could be used to query for patterns in your data.

Sybase Regular expressions:
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.11.0.0/dbreference_en11/rf-sqllanguage-s-3888756.html

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 04, 2011 Sep 04, 2011
LATEST
It looks like Sybase (at least version 11) supports regular expressions in queries which could be used to query for patterns in your data.

Sybase Regular expressions:
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sql anywhere.11.0.0/dbreference_e...

It doesn't seem like Sybase regexes support back references, which is annoying.  This is not based on any specific knowledge, but on no mention in the docs, and nothing on Google for search strings like "sybase regex backreference".

However this regex would work in CF if you wanted to pull the IDs back to CF and analyse them:

^(.)(?:\1{3,8})$

This matches strings of 4-9 length containing only the same character, but will match chars other than digits. If you only want to find "1111" but not "AAAA". then change it to this:

^(\d)(?:\1{3,8})$

That said, I'd try to process this all on the DB if poss, which might mean using something along the lines of what BKBK has suggested.

--

Adam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 03, 2011 Sep 03, 2011

I know little about Sybase. I'm just guessing with:

select myID from Mytable
where NOT REGEXP_SEARCH(MyTable.myID, "[1]{4,9}|[2]{4,9}|[3]{4,9}|[4]{4,9}|[5]{4,9}|[6]{4,9}|[7]{4,9}|[8]{4,9}|[9]{4,9}")

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 04, 2011 Sep 04, 2011

If all else fails, you've always got the obvious, 'dumb' approach,

<cfset digitList = "'0000','1111','2222','3333','4444','5555','6666','7777','8888','9999','00000','11111','22222','33333','44444','55555','66666','77777','88888','99999','000000','111111','222222','333333','444444','555555','666666','777777','888888','999999','0000000','1111111','2222222','3333333','4444444','5555555','6666666','7777777','8888888','9999999','00000000','11111111','22222222','33333333','44444444','55555555','66666666','77777777','88888888','99999999','000000000','111111111','222222222','333333333','444444444','555555555','666666666','777777777','888888888','999999999'">

<cfquery>
SELECT myID FROM Mytable
WHERE myID NOT IN (#preserveSingleQuotes(digitList)#)
</cfquery>

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 23, 2011 Aug 23, 2011

Care to elaborate on "So the search of the same digits it this ids also should also correspond to the length of the ids"?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 24, 2011 Aug 24, 2011

Dan, I just posted my explanation

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources