Archive for the 'Chapter 8' Category

26
Mar
10

An interview with Denis Gobo

As part of our continuing series of going through the book, SQL Server MVP Deep Dives we have had the opportunity to ask some of the authors a few questions.  This week we are chatting with Denis Gobo   (LessThanDot – blog,  sqlblog,  Blogspotblog,  @DenisGobo - Twitter).

1.)   In the book SQL Server MVP Deep Dives, why did you choose the two subjects that you chose? (Chapter 8, What makes a bulk insert a minimally logged operation?  Chapter 20, Why every SQL developer needs a tools database)

The reason I picked these two subjects is because of a couple of reasons. In regards to bulk loading, just using bcp or BULK INSERT is not enough there are more things needed to make it truly minimally logged. People in general know about simple or bulk recovery but there is still a large percentage of people that don’t know that you also need to lock the table. In the chapter I wrote I show the difference in the size of the LOG when you lock and don’t lock the table.

In regards to the Tools database, it is because I struggled myself with maintaining code that should live in one place and one place only. There are database I stumbled upon that had the same code in several stored procedures, this code was identical, it was doing some calculation. So you of course already know that sometimes the change was made in 4 instead of 5 procedures and you might not notice for a couple of months, especially if it was end of quarter calculation or worse year end. Having a Tools database simplifies coding, testing and debugging, it really is as simple as that.

 

2.)    Do you have any tips on minimally logged operations for partitioned tables and the switching of partitions therein?

You know I can tell you a small percentage of what you need to know here or I can tell you to read the We Loaded 1TB in 30 Minutes with SSIS, and So Can You whitepaper by Microsoft   which covers way more than I could ever cover in this interview. It is a must read for everyone importing a lot of data.

 

3.)    Has the Tools database that you reference in Chapter 20 ever been a challenge for management or the business to understand / support?  If so, what are a couple of compelling reasons to those that don’t just “get it”?

In general business doesn’t get involved with my databases so I am lucky in that regards. Management definitely sees the value proposition here; think of the reasons why companies purchase/install a framework or a component. They do it because it is cheaper than writing it yourself. A framework will be used by lots of people and tested by lots of people, there is no reason to write/debug/test these things yourself and reinvent the wheel. The same thing is true for a Tools database, it is better that 10 developers use 1 business function instead of each developer having their own flavor of the same function. It is also nice to say to your manager that the code that the other developer needs to write already exists in the Tools database for 2 years and has been tested thoroughly.

 

4.)    To date, what has been your favorite feature / functional aspect of SQL Server, regardless of version?

Partitioned functions, date data type and data compression are three of my favorite things

Up until last year I was still on SQL Server 2000 on some servers and I had partitioned views on those servers. There is a lot more maintenance and correct setup (think check constraints on the partitioned column) required than partitioned functions to get the optimizer to only touch the base tables affected by the WHERE clause. Compression is just a no  brainer especially if you are IO bound, it also saves money in terms of SAN space since your storage needs less space. Dates are a biggie for me because I save 5 bytes per row and if you have multi billion row tables this adds up.

 

5.)    What’s the number one technical deficit that you believe SQL Server professionals need to address?

The biggest problem with SQL Server in my opinion is that there is just too much stuff in the product these days to become proficient in all of them. Just take a look what happened when SQL Server 2000 came out compared to SQL Server 7, we got indexed views, user defined functions and maybe a couple of other things. Now take a look at when SQL Server 2005 came out, wow that was quite a learning curve. There were new tools (SSMS and BIDS), new way of getting metadata with DMVs, Service Broker CLR, SSIS, SSAS etc etc. These days you see more and more specialists, people focus on performance or BI or do only SSIS. Of course now that they added c# to SSIS 2008, there is yet another thing to learn if you need to maintain other peoples work.

In terms of databases itself I think that data about the data (metadata) is virtually nonexistent, how many times have you opened up a database and looked at a table where the names just did not make any sense to you. And there was no documentation except for some comments in some stored proc.

This of course brings up naming convention but that is a whole different can of worms and I won’t even go into that here.

 

6.)    What’s the number one piece of functionality or change that you would like to see in the next version of SQL Server?

Okay, there are things I can and cannot say because I know some of the things that are coming in SQL 11 and I am under NDA for those. Well I can tell you about one thing I would like to see and that is the ability to store a table in columns not rows. This would be especially handy if you could do it against read only partitions and where only the active partition was stored in rows and the rest of the partitions in columns. Here is some made up syntax

ALTER PARTITION Bla

MARK RANGE (…) Active, STORAGE = ROWS

All_OTHER_RANGES, STORAGE = COLUMNS.

Something like that, Gemini on the BI side is using columnar storage already so I can’t see why this can’t make it into a future version. Of course if you do SELECT * then you won’t benefit from columnar storage, ideally you should think of it as one big index with included columns for all the columns in the table.

 

7.)    Where does the SQL Server community at large find you when you’re not at your job?

There are a couple of places I can be found. I participate in the SQL Server programming  forum at LessThanDot, a website I co-founded with a bunch of other people I met in cyber space. I also blog on LessThanDot here: http://blogs.lessthandot.com/index.php/All/?disp=authdir&author=4. You can also find me on twitter @denisgobo.  I answer question on Stackoverflow, here is my profile: http://stackoverflow.com/users/740/sqlmenace. There are some more places online where I am active at but the ones listed are the ones I am most active.

Finally, I will also be at SQL Saturday 39 in New York City on April 24th but I won’t be speaking.

 

8.)    I’ve saved the most important question for last… Does the Query Optimizer respond better to cursing in one language over another?  If so, can you PLEASE share the knowledge?

Ha, this depends on the default collation of course :-)

26
Mar
10

Minimally logged bulk insert

What makes a bulk insert a minimally logged operation? The answer to this question lies within this chapter. I know I have mentioned this before but this is another great chapter in the Deep Dives book. A lot of detail and a lot of examples. Actually, the best way to experience this chapter is to run the examples that are provided by the author. I have used the bulk insert process before and let me tell you that if you have tons of data to insert, this is the way to go.  Just remember that when you do use this process, you will need to have certain settings setup so that it does not take forever or take up all of your drive space.

If you are a new reader of this blog, welcome. We hope you return again. Please leave us comments. Have a great week.

26
Mar
10

Chapter 8 – What makes a bulk insert a minimally logged transaction?

Ahhh the bulk insert.  It’s like a big glass of ice-cold water in the Sahara.  And the minimally logged bulk insert is like a swimming pool near a grape-vine in the middle of the Saraha. 

For years I’ve used these methods and learned some of these lessons the hard way.  Unlike many different activities in SQL Server, there is only ONE way to minimally log bulk inserts into SQL Server and Denis (LessThanDot – blog,  sqlblog,  Blogspotblog,  @DenisGobo - Twitter) nailed it in this chapter in a concise, articulate and very easy to understand manner.  I spent about ten minutes prior to beginning this blog post on MSDN / BOL to find one source with all of the rules that Denis neatly lays out in this chapter and I found most of them in four different references.  That’s one of the values that this book provides… it distills a very important topic and then goes into depth on that topic, all in one place. 

My experience with bulk inserting data spans many years and there are many items to take into account from a performance perspective.  In my experience, there are several elements that allow bulk insert to perform optimally.  While your mileage may vary depending on your unique circumstances, ensuring that the data is loaded in the order it will be persisted is huge. 

Other important items are choosing the optimal batch size, running the build insert process in parallel, ensuring that there’s plenty of tempdb IO, placing a table lock on the object you are importing into, disabling or dropping indexes, disabling auto create and auto update statistics and the format of the file (using native format).  Each of those items has constraints, rules and guidelines around them that can be found in BOL. 

I’d strongly recommend that if you are interested in making your bulk insert processes scream that you take the time to research all of the items I just mentioned along with ensuring that the rules that Denis went over in the chapter. 

Another thing to consider is partitioning in SQL Server 2005, but more so SQL Server 2008 as partition parallelism was introduced in the 2008 version.  As always, test, test and test some more.  A lot of the fine tuning with bulk inserting is based on your specific file structure, your server hardware and your I/O subsystem.  If this is something that you are really interested in or if it’s something that you’ve already implemented on your databases, take a look at this connect item:  https://connect.microsoft.com/SQLServer/feedback/details/468517/update-statistics-at-the-partition-level and then Vote for it!

24
Mar
10

Dealing with the Bulk Operations

 I think most DBA’s understand that to really see the differences between different options you really need to see them side by side. So for example, if you were looking at getting a hybrid car and you read the label you may not understand exactly how many miles per gallon the car really gets, is it good is it bad or is it really the same? How do you test this? But if you had two cars that were identical in every way except one was a hybrid and one was not then you could see…

  • Differences in Power
  • Differences in Mileage or distance
  • Differences in cost to fill the car up after 100 miles

The same is true for most SQL Server activities. If you want to see if an index impact a query; then run the query without the index, clear the cache, build the index and run the query again. Denis Gobo does an excellent job is using this scientific method to show you how to take advantage of minimally logged operations. Denis is one of the founders of lessthandot.com and is a great resource when looking for direction with SQL Server.

To get the most of this chapter do yourself a favor and run the code that the chapter is talking about, this way not only do you read the information but you see the impacts first hand.




Chapters

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 13 other followers


Follow

Get every new post delivered to your Inbox.