10
Feb
10

Chapter 2 – SQL Server tools for maintaining data integrity

“Now ideally you’ll test the consistency of the data every time that you use the data, to verify that it’s correct.  It doesn’t take a genius to determine how long before that get’s old”.  That quote comes from paragraph two in chapter two. 

Most companies will deal with constraint violation errors, projects around de-duplication, re-design of applications, including the database layer to correctly persist the data that is the life-blood of the organization.

If you are a DBA, a database developer, a .Net guru or an Access fanatic, there is NOTHING more important than maintaining the integrity of your data.  If you find yourself in a situation where you are having a hard time justifying the cost to resolve these issues at your company, then ensure that you have your backside covered with plenty of documentation around why it’s an issue and the potential problems that can occur with inaccurate data.  If even then you cannot justify the effort to your company it means that you have not presented a strong enough case.  As each scenario is unique, one common theme that I’ve found to work throughout my career is to dig into the COST (in real dollars) associated with “fixing” or “resolving” the differences in reports or the dollars spent in resolving the one – off issues.

The tools that Louis mentions in the chapter are spot on and I’m glad to see that he included data types and NULL specification because data integrity isn’t limited to just Primary Keys, Foreign Keys and Unique constraints.  Some of these specific items point right back to chapter 1.  This brings me to a conversation that I’ve had with my mom, neighbors and friends when they ask me “what do you do”?  I explain it at a very high level and if they are still with me I try to explain how a database is a nebulous blob of bits and bytes that all co-exist in harmony and that each bit and byte are independent in and of themselves but each bit and byte are dependent on each other for the entire picture to come into focus.  If the bits and bytes are unable to know how they relate to the bigger picture, then they are probably going to grow and become a form of virus that queries such as this one will uncover:

SELECT employee, isactive, salary

FROM employee

GROUP BY employee, isactive, salary

HAVING count (*) > 1

If that returns a result set, just say uh-oh and start packing your box.

On a serious note, elements of that query are asked in EVERY interview I’ve ever conducted.  If someone doesn’t know the elements of that query it tells me a few things about the candidate. 

1.)  They have always worked with perfectly designed databases and never needed it.  This tells me that they didn’t design that database; for if they had, they would understand that query, it’s usage and the elements there in.

2.)  They simply don’t know it. 

Neither answer is acceptable in my interview and the worst answer is “select distinct”.  That will find an interview cut short.

I’ve worked in more than one environment where I was told by the architect that constraints would be handled in the application.  This is a design principle that has a great deal of merit from an application standpoint.  However, from a database standpoint, it has absolutely no merit what-so-ever.  An application that truly validates the integrity of your data would be so incredibly intrusive that it’s immediately rendered infeasible.  Additionally, data constraints at the application level are no constraints at all.  They are merely how the application responds.  This means that NO ONE and NO OTHER process and NO OTHER login to the database has the ability to change one element of data.  Let’s say that is the case, then there’s the issue of an interrupted transaction due to a network glitch, due to a timeout, due to a deadlock etc, etc, etc….   Keep the data constraints at the closest level possible to the database… meaning, in the database.

In this chapter, Louis states “none of this is that hard to do”.  Indeed, none of it is hard to do and while it’s more challenging in an environment that’s “in production”; it’s still feasible, warranted and easy to justify.

I’ve had the opportunity to mentor developers and DBA’s alike over the years and this concept isn’t always an intuitive one.  One of the tools that SQL Server provides is the diagram feature is SSMS.  This tool is a double-edged sword in my opinion.  On one hand, it’s a very easy to illustrate a foreign key constraint and show how objects relate to each other but on the flip side you can SAVE the changes to the database.  I’m not a fan of that piece of the functionality but I understand that not all organizations can afford a data modeling tool like ErWin.  That said, Visio has come along nicely in recent releases and you can do full-fledged data modeling in visio.

In wrapping up this post, I’m going to pose a question….

Would it be fair to say that the integrity of the database you design, develop against or are ultimately responsible for directly correlates to your own integrity and the integrity of your organization?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Chapters

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 13 other followers