Skip to main content
Inspiring
April 16, 2007
Answered

Database Design Question

  • April 16, 2007
  • 2 replies
  • 1609 views
Currently we have a subscription service website which manages time
and attendance for companies using a MySQL database. Currently all
companies and users are in the same database. Every company has a
companyid and every user has a userid and is joined to a company by
the companyid. We do all standard select, insert, update, delete
queries you would expect using the companyid and userid as keys for
joining and selecting records across tables. The companyid and userid
is looked up and stored in the session when a user logs in.

This has worked well until recently where there have been a few
concerns. One is the potential security risks of running all queries
on the companyid condition. While we take every precaution to not pass
the companyid in any forms or urls for sql injection if someone was
able to alter their session companyid in some freak way they could get
to the sensitive data of any other customer. Also, any coding errors
as some of our quries are quite complex have the potential for
altering other users data with careless bugs (and there have been a
few). More importantly though is the issue of locking data and
preventing race conditions. Currently there is a single admin per
company who can run a report for a record set, review those records
and mark the ones to be updated, and then submit it for processing. We
need to add the ability to have more than one admin with this ability.
Obviously, if two users then try to do this same this there a
potential race condition which may occur. On top of that, employees
have the ability to modify their records as well. If an admin runs a
report and is reviewing it and an employee updates a record obviously
the admin will not see it and someones data will be overwritten.
Obviously this is undesirable. We considered using read locks on the
table so that while an admin is reviewing records no one else could
update until they submit their action, obtain a write lock on the
table, update their data, and release the lock. The problem is that
locking the table will lock ALL users from the table not just the
users of the specific company. I looked into creating a view on the
specific data and locking the view but as far as I can tell from the
documentation, a lock on a view requires a lock on all the base tables
involved and now were back to square one.

On top of all this, recently we had a corruption in a MySQL table
which affected ALL users records. It was fixed easily enough with
MySQL's repair utilities but it affected service for all users for
almost 20 minutes.

So, we thought about changing our approach to create a new database
with every new company. This would ensure that anyone can ever access
only one company's data and prevent and possibility of cross data
contamination or security breaches. This would also allow table locks
to be employed as an admin could perform their actions with the
knowledge that they are essentially locking the system from their
employees until they are done, which is an acceptable compromise. Any
table corruption would affect only one company as well without
affecting the entire system. Also, as we grow tables, are growing
exponentially. We're in the 10's of mb for some tables so it's not an
issue at the moment but MySQL can only handle up to 4GB per table.
This would ensure plenty of growth per company before it ever became
an issue. And lastly, a single company's data can easily be backed up
and restored in it's entiretly with affecting the rest of the system.
We currently run hourly backups and in the event a database or table
needs to be restored a single company will lose at most one hour of
data instead of all companies in the system. In one case we are
already doing this of sorts. We have a customer on a dedicated machine
just for them in which we customized the software to fit their
specific needs with their own MySQL server and database.

The downsides include having to keep a master database to manage a
single signon solution which I think is there the real problem with
this plan is. All new users and companies would have to be inserted
into a master database table to generate unique ids and to store what
datasource to use when authenticating the user. Then there is the
overhead of creating an entire new database everytime a user signs up
with our system and updating the database structure across hundreds of
companies, potentially thousands, as we add features. It would mean
adding a new datasource for every new company. It would also prevent
the ability for us to track and monitor system statistics easily, like
the total number of users since it would be the the aggregate value of
one table in hundreds of databases.

Maybe this is a completely ridiculous solution but it seems like it
might have it's benefits (as well as trade offs). We're gearing up for
a potential ground up rewrite of our system and this would be the
place to start.
Any comments or suggestions for or against would be greatly
appreciated.

Anthony
    This topic has been closed for replies.
    Correct answer Steve Sommers
    quote:

    Originally posted by: tclaremont
    You don't really think that credit card companies and financial institutions keep every customer in a different database, do you?

    I think the headache of doing this the RIGHT way is going to be FAR less painful than the headache of doing things the way you are proposing here.
    Depends on your definition of "credit card company." As a payment gateway provider, we are required and we do keep customers' databases separate. It is a pain, but we are required to do so and we go through a yearly third-party audit.

    I don't think the same requirements apply to financial institutions because the financial institution is the company and it is their own database (if that makes sense?). If they were hosting their customer's databases, then they would most likely have the same requirements.

    Now as to the original post, I think it depends on what "sensitive" information you are holding and how much? If you are hosting financial information of any kind or customer lists for each of your companies you are hosting, then yes, separate databases are safer.

    It is A LOT work and I cannot stress “A LOT” enough. But there are a few additional advantages you may or may not realize:


    1. Easier to isolate individual database backups and allow the customer to maintain and download his/her own backups (many customers don't trust the marketing brochures stating "daily offsite backups")
    2. Less prone to deadlock conditions if you have high database volume (you are isolating record locks down to the individual company instead of across all your hosted companies)
    3. Easier to roll-out database upgrades - instead of being down across the board during upgrades and data massages, you can upgrade individual databases
    4. Better scalability in that you are not as dependent on the database server technology. It's a no brainer to put customer databases on multiple database servers without having to rely on spanning a single database across multiple servers.


    There are others, these are just a few that were on the top of my pointy little head.

    Hope this helps. Good luck.

    2 replies

    WebPexDevAuthor
    Inspiring
    May 2, 2007
    Thanks for clearing that up! To know that payment provides are required to keep separate databases is extremely helpful. Knowing that this setup can and does work is a tremendous help. What are some of the drawbacks or difficulties you experience besides the obvious and those already covered if any?
    Legend
    May 2, 2007
    Again, just off the top of my head, three difficulties that come to mind:
    1. We use MS-SQL and with it, the log management is a real pain. We had to write services that continually cycle through all the databases performing automated log management and backups. Using the built in scheduler is just too cumbersome when dealing with 1000's of databases on each server. I'm not nearly as experienced with MySQL so I'm not sure if log management is an issue.
    2. Replication is difficult with a single database, doing it with thousands is next to impossible, at least with how MS-SQL implements it. We had to write our own replication agents. I'm not sure if MySQL has the same difficulties or if you even need replication.
    3. With our application we have the ability to have parent companies that want access to multiple properties that they own or manage. In this mode, consolidated reporting becomes a bear.
    tclaremont
    Inspiring
    April 17, 2007
    You don't really think that credit card compaies and financial institutions keep every customer in a different database, do you?

    I think the headache of doing this the RIGHT way is going to be FAR less painful than the headache of doing things the way you are proposing here.
    Steve SommersCorrect answer
    Legend
    May 2, 2007
    quote:

    Originally posted by: tclaremont
    You don't really think that credit card companies and financial institutions keep every customer in a different database, do you?

    I think the headache of doing this the RIGHT way is going to be FAR less painful than the headache of doing things the way you are proposing here.
    Depends on your definition of "credit card company." As a payment gateway provider, we are required and we do keep customers' databases separate. It is a pain, but we are required to do so and we go through a yearly third-party audit.

    I don't think the same requirements apply to financial institutions because the financial institution is the company and it is their own database (if that makes sense?). If they were hosting their customer's databases, then they would most likely have the same requirements.

    Now as to the original post, I think it depends on what "sensitive" information you are holding and how much? If you are hosting financial information of any kind or customer lists for each of your companies you are hosting, then yes, separate databases are safer.

    It is A LOT work and I cannot stress “A LOT” enough. But there are a few additional advantages you may or may not realize:


    1. Easier to isolate individual database backups and allow the customer to maintain and download his/her own backups (many customers don't trust the marketing brochures stating "daily offsite backups")
    2. Less prone to deadlock conditions if you have high database volume (you are isolating record locks down to the individual company instead of across all your hosted companies)
    3. Easier to roll-out database upgrades - instead of being down across the board during upgrades and data massages, you can upgrade individual databases
    4. Better scalability in that you are not as dependent on the database server technology. It's a no brainer to put customer databases on multiple database servers without having to rely on spanning a single database across multiple servers.


    There are others, these are just a few that were on the top of my pointy little head.

    Hope this helps. Good luck.