Chapter 9 – Avoiding three common query mistakes

In the editorial of chapter two, I stated “…there is nothing more important than maintaining the integrity of your data” and while I believe that to be a true statement, it goes nearly hand in hand with Kathi Kellenberger’s statement “The query must return the expected results…”. 

Kathi’s (interview | twitter) chapter deals with a very interesting set of issues that aren’t intuitive for most folks because of the “false positive” nature of some of the results.  Imagine writing that first stored procedure that has inner joins, outer joins, a derived table and the nearly infamous NOT IN clause and learning that the result set is… wrong.  This would not be a big deal if you were writing the query for your own stamp collection database but in the real world, when data professionals are asked for data, the expectation is that the result set will accurately depict what’s persisted in the database.  

This is a very interesting book in how well it interleaved the same concept, by different authors, in different chapters.  In this case I’m reaching all of the way back to chapter 2 where Louis Davidson discussed the importance of defining your NULL specifications correctly in the data model.  In the example provided by Kathi, if the table design had more normal form then the Color column in the Product table would have been a reference key to another table which contained all of the available colors.  In the case of an unknown color there could have been an entry in the referenced table that looked something like “1, ‘Unknown Color'”.  Had the data model and usage of potential NULLS been thought about upfront, there wouldn’t have been the issue that’s described. 

The intent of me bringing that up ties right back to Chapter 2’s emphasis and importance on thinking through your data model to help avoid scenarios such as the color example above.  It’s also to draw out the larger point that reality <> perfect data model.  Fair enough.  Anyone who knows me will likely tell you that while I’ve had some “duhhhh” moments, I’m not likely going to skimp on communicating the value of upfront design.

All of that to say that since we aren’t in a world of perfect data models, nor should we be for reasons that I’m not going to get into here, it’s imperative to have an absolute grasp on NULL handling.  I’ve always explained the NULL value in a row of data as “the absence of any value”.  This is an important distinction because of how SQL Server must evaluate criteria you are asking it to evaluate.  Yeah, I’m making this more confusing… perhaps you should read the chapter because Kathi does a great job explaining it.

Multiple outer joins.  This was incredibly confusing to me early on in my career.  I had a t-sql statement that I was trying to “fix”.  It was doing a left outer join on a derived table that joined back into the larger context of the query.  Needless to say, I learned this lesson the hard way and I hope that you will learn from the mistakes that others have made before you.  One of the most helpful elements in keeping your joins correct is formatting, formatting, formatting.  Back in the day, I learned join syntax in this manner:  Select emp.name, empatt.age, c.city from employee emp, employee_attribute empatt, city c where emp.id = empatt.employeeid and empatt.cityid = c.id.  If you just looked at that and thought, what the …. ??  That’s a good response.  Utilize ANSI Standard join statements.  It’s much more readable and easy to understand… 

SELECT EMP.name, EMPATT.age, C.city

FROM Employee EMP

    INNER JOIN employee_attribute EMPAPP ON EMP.id = EMPAPP.employeeid

    INNER JOIN city C on EMPAPP.cityid = C.id

Wow, what a difference.  It’s very evident which tables exist in the query and how they are joined.

The third common query mistake that Kathi covers is the GROUP BY statement.  While it’s very straight forward in a simple query, it can tame even the most savage t-sql programmer when it’s introduced as a subset of an overall result set.  Much like other things in life, there can be different levels of GROUP BY when using t-sql.  Think about driving for a moment.  Most of us would consider ourselves very proficient at driving a car.  Most of us would also consider many others as not being proficient at driving and NASCAR drivers snicker when they overhear guys talking about how they blew the doors off of the shiny Mustang at the red light on the way to the restaurant.  My point here is that it’s simply the perspective you are coming from.  At the end of the day, does a NASCAR driver get you to the store more efficiently than you can yourself (providing they abide by all of the laws)?  Nope.  So don’t be intimidated by t-sql.  Roll up your sleeves and just jump in (on a dev or local instance, of course).


0 Responses to “Chapter 9 – Avoiding three common query mistakes”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 13 other followers