Chapter 5 — Gaps and Islands

I’ve been reading articles written by Itzik Ben-Gan for years.  I’ve also had the opportunity to see him present at a conference in the past.  With that history, I knew that this chapter would provide compelling fodder for this blog and my imagination.  I’ve read up about gaps and islands in the past but I’ve always associated them with ID fields (typically the auto-identity variety) that need to be re-seeded when massive delete operations occur.  The concept of temporal gaps and islands is something that I’ve worked with for quite some time; yet I never thought to employ the similar mindset of gaps and islands as they refer to numerical problems.  It seems rather silly now that I’ve typed it because clearly it’s something that we do on a fairly regular basis without thinking about it. 

A common scenario that I’ve seen for this is around an order entry system.  A question of “How many orders were received after our ad was televised during the Super Bowl” versus “How many orders were received during the State of the Union address”?  Clearly, one will provide an island of data (Super Bowl) and the other will provide a signficant gap… That is unless the sales being discussed are earplugs, in which case, the opposite would be true.**

** My wife was convinced that I couldn’t make a joke about politics, bring up the Superbowl and make a technical point all in the same example.   HA! 

On a serious note, I believe that this chapter covers some very important fundamentals and aspects within the internals of SQL Server that are also very, very relevant.  In the opening paragraph, Itzik writes 

Besides varying in terms of the data type of the values (numeric and temporal), sequences can also vary in terms of the uniqueness of values. For example, the sequence can have unique values, that is, unique keys, or non-unique values, that is, order dates.

Why is this important with respect to SQL Server internals you might ask?  Well, I’m going to attempt to make that case, here and now.  I doubt that it will be with the finesse of many experts out there; but bear with me and I’ll demonstrate why this concept is critical to achieving an ideal data model and also help explain why SQL Server works the way that it does.  In my first post on this blog (Chapter 1), I discussed the value of meta-data as it pertains to the databases that you develop against, support or are otherwise responsible for.  This meta-data and the foundation behind its value and importance to the query optimizer can only be truly grasped if the concept of gaps and islands is truly understood.  I’ve often referred to this as the data distribution in a table or a set of tables.

One of the fundamental steps in tuning a query or developing a data model is to understand what the cardinality of the data is going to be.  In a database that has many gaps and signficant (non-unique) islands, there exists a significant potential for performance degradation based on which data you selecting, updating etc…  When you begin to troubleshoot a poor performing query, Books on-line talks about cardinality estimation and defines it as “The first factor, cardinality, is used as an input parameter of the second factor, the cost model.  Therefore, improved cardinality leads to better estimated costs and, in turn, faster execution plans.”  Ah-Ha!  So, this leads us to the olden days of having to generate “good plans” in SQL Server when the server was re-started by executing queries or stored procedures that we knew would generate the optimal plan.  For instance, If we have a query that get’s all orders based on a date and the dates of the orders in question have signficant gaps and islands in them, we learned that if the first query against the table was on the first set of dates and those happened to be a tightly grouped set of islands in the early 1900’s; then the query plan for 2008 might not be the ideal plan because there are significant gaps in the data over the course of the 50’s and 60’s.  This is a very real scenario and while your data probably doesn’t date from the early 1900’s to today; the problem exists even with different data distribution over a period of days! 

The main emphasis of the chapter, I think, is to also illustrate that there are many methods to achieve a similar result.  In the case of identifying gaps and islands, Itzik provides the same answer to the question utilizing 4 distinct methods for islands and 4 distinct methods for gaps.  Interestingly, the best solution for each is not the same.  There are actually third-party tools out there that can be purchased which will look at the query you have written and auto-magically re-write the query many, many different ways and find the “best” query.  While I see a great deal of value in tools such as those, there can be no better method of writing good queries than understanding the fundamentals of why, how, where and when SQL Server fetches the data based on how you asked it to fetch the data.  The reason for brining this up is that any tool that can find the best way to retrieve data is based only on a point in time.  And as time goes on and your data characteristics change, so also will your query plan and when your query plan changes, it’s quite possible that modification to your code will also provide additional performance gains. 

Clearly, to fully understand and appreciate his code, you must have the book which can be ordered here: www.SQLServerMVPDeepDives.com.  If you are following this blog and you are not following along in the book, you are getting about 1/1000th of the value that can be had by ordering the book.

We are also still soliciting guest bloggers.  Pick a chapter, send us an e-mail and we will put your name in lights… https://sqlperspectives.wordpress.com/guest-blogger-of-the-week-info/


0 Responses to “Chapter 5 — Gaps and Islands”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 11 other followers