Author Archive for Jeremy Lowell

14
Sep
10

Chapter 32 – Partitioning for manageability (and maybe performance)

The second chapter in a row on partitioning and it’s another excellent chapter.  Dan Guzman – Blog - takes us on an adventure with this chapter in looking at execution plans between partitioning in 2005 and 2008.  In addition he takes the time to spell out what some good use cases are for obtaining additional performance gains from partitioning.

I’m glad that he took the time to do that and it’s one of the most mis-understood elements of partitioning.  Partitioning will not always increase your performance.  As a matter of fact, it could really, really degrade your performance if your partition key, storage and research are well planned. 

In my experience, there is almost always a good partition key which can be found in the data you are looking to partition.  Often it’s a datetime field such as CreatedDate, TransactionDate or InsertedDate etc…  you get the picture.  It’s a date that has a natural order to it over time.  It’s not to say that all records will always land in perfect order but it is to say that more often than not, they will.  Imagine using a column like “ModifiedDate” as your partition key…  every time someone or some process updated a record, it would potentially end up in a different partition; could create havoc.  Or even worse, a column like “birth-date”.  Chances are that very, very few of the Select statements against the employee table contain a where clause with a small range of birthdays…. i.e., I want to see all employees who were born from march – june of 1975.   What good would that do except for the monthly query that H.R. runs to send out Happy Birthday cards?

In closing, I’d recommend that you read this set of blog posts in conjunction with the chapters on partitioning to get started and learn a few things along the way.  http://sqlperspectives.wordpress.com/category/chapter-31/

01
Sep
10

Chapter 31 – Some practical issues in table partitioning

This chapter is incredibly relevant in my world at the moment.  I’ve helped several clients implement table partitioning and just recently I was on-site at a client at 2:00AM to lay down the new partition scheme and migrate / move all of the data into the partition.  A relatively simple task with a lot of planning and testing… the most interesting part of this implementation was to make this all occur with zero time-outs on their system.  In fact, not only were time-outs not allowable but transactions never took more than three seconds during the operation. 

In this chapter, Ron Talmage provides an excellent baseline introduction to partitioning.  He begins the chapter by stating “Table partitioning is an involved topic and can quickly become complex…”.  At the end of the chapter he states “Table partitioning is quite an involved topic…”.  I appreciate that Ron pointed out that this isn’t some old-hat, do it in your sleep type of feature or implementation in SQL Server.  

In my experience, partitioning is an excellent tool for obtaining database scalability and manageability.  It can also be implemented incorrectly and that can actually de-grade your performance and cause a management nightmare.  I’ve been involved in both scenarios and this chapter begins to lay down the knowledge you need in order to correctly implement table partitioning. 

One of the coolest things about partitions, in my opinion is the ability to have very specific control over the size of your data files.  This can be very, very helpful when you anticipate a need for significant growth or when you first implement table partitioning.  At the beginning of this post, I mentioned that I was just involved in a partitioning excercise that was accomplished against a table while the system was fully on-line, available and being used.  There are a number of steps associated with successfully doing this and the information below will show how this was accomplished (at a high level). 

First, choosing the correct partition scheme is absolutely critical.  There are additional sources that Ron pointed out in his chapter that I’d recommend you read up on before implementing partitioning.  

Once we had identified the correct partition key (in this case a datetime field); the partition function was created as a sliding window (more info on that next week).  Now, we have to lay down the files (by partition function) and populate them.  This is where it gets fun.  The t-sql below is where I begin with this effort.  In short, this script will figure out what size the partition files should be, based on your partitionkey (in this case it’s the datetime_column from the Your_Table table, create the filegroups and then lay down the files you need; sized accordingly.

  USE <YourDBName>

GO

SET NOCOUNT ON

select COUNT(*) as numrows,

            'Y'+ CAST(DATEPART(yyyy,datetime_column) as varchar(4)) + 'Q' + CAST(DATEPART(q,datetime_column) as varchar(2))+ 'FG' as datetime,

            CAST(null as decimal(9,2)) as numrowdate, CAST(null as decimal(9,2)) as percentage,

            CAST(null as decimal(9,2)) as currentsize, CAST(null as decimal(9,2)) as partitionsize

into #tmp_Transaction

FROM [Your_Table]

group by

            'Y'+ CAST(DATEPART(yyyy,datetime_column) as varchar(4)) + 'Q' + CAST(DATEPART(q,datetime_column) as varchar(2))+ 'FG'

go

update #tmp_Transaction set numrowdate = (select SUM(numrows)     from #tmp_Transaction)

go

update #tmp_Transaction set percentage = b.total

      from

       (select numrows / numrowdate as total, datetime_column from #tmp_Transaction) b where #tmp_Transaction.datetime_column = b.datetime_column

go

create table #space_Transaction

      (name varchar(255), rows int, reserved varchar(55), data varchar(55), index_size varchar(55), unused varchar(55))

insert #space_Transaction

exec sp_spaceused [transaction]

go

update #space_Transaction set reserved = REPLACE(reserved,' kb', '')

go

update #tmp_Transaction set currentsize = (select CAST(reserved as decimal(9,2)) from #space_Transaction)

go

update #tmp_Transaction set currentsize = isnull(currentsize,0) , partitionsize = isnull(partitionsize,0)

GO

 -- ** NOTE **  I've multiplied the size by 10 percent; change this to 1.0 if you want it to be exact.

select      datetime as infodatetime_Transaction,

            cast((isnull(currentsize, 150) * 1.1 / 1024) * percentage  as int) as partitionsize_Transaction,

            cast((isnull(currentsize, 150) * 1.1 / 1024) * percentage *.1 as int) as partitionsize_Transaction_growth,

            numrows

INTO #tmp_Transaction2

FROM #tmp_Transaction

create table #filegroup_name (id int identity(1,1), FileGroup varchar(22))

insert #filegroup_name (FileGroup)

values ('Y2009Q1FG')

insert #filegroup_name (FileGroup)

values ('Y2009Q2FG')

insert #filegroup_name (FileGroup)

values ('Y2009Q3FG')

insert #filegroup_name (FileGroup)

values ('Y2009Q4FG')

insert #filegroup_name (FileGroup)

values ('Y2010Q1FG')

insert #filegroup_name (FileGroup)

values ('Y2010Q2FG')

insert #filegroup_name (FileGroup)

values ('Y2010Q3FG')

insert #filegroup_name (FileGroup)

values ('Y2010Q4FG')

insert #filegroup_name (FileGroup)

values ('Y2011Q1FG')

insert #filegroup_name (FileGroup)

values ('Y2011Q2FG')

insert #filegroup_name (FileGroup)

values ('Y2011Q3FG')

insert #filegroup_name (FileGroup)

values ('Y2011Q4FG')

DECLARE @SIZE VARCHAR(20)

DECLARE @FILEGROWTH VARCHAR(20)

DECLARE @PATH NVARCHAR(255)

DECLARE @DBName NVARCHAR(255)

SET @PATH = 'C:\Partitioning_Files\'

SET @DBName = DB_NAME()

      DECLARE  @SQL varchar(4000)

                  , @counter int

                  , @ErrorCode integer

                  , @FGName varchar(100)

                  , @NumRows int

SET @ErrorCode = 0

                  SELECT @counter = 1

            WHILE @counter <= 12 AND @ErrorCode = 0 BEGIN

                  SELECT @FGName = (Select FileGroup from #filegroup_name where @counter = ID)

                  SET @ErrorCode = @@Error

                  IF @ErrorCode = 0 BEGIN

                        SELECT @SQL = N'ALTER DATABASE '+ @DBName + ' ADD FILEGROUP ['+@FGName+']'

                        SET @ErrorCode = @@Error

                  END

                  IF @ErrorCode = 0 BEGIN

--select @sql

                        EXECUTE (@SQL);

                        SET @ErrorCode = @@Error

                  END

                  IF @ErrorCode = 0 BEGIN

            SELECT @Size = (Select ISNULL(Partitionsize_Transaction,150) from #tmp_Transaction2 A JOIN #Filegroup_Name B ON A.infodatetime_Transaction = B.FileGroup and B.ID = @counter)

            SELECT @FileGrowth = (Select ISNULL(Partitionsize_Transaction_growth,150) from #tmp_Transaction2 A JOIN #Filegroup_Name B ON A.infodatetime_Transaction = B.FileGroup and B.ID = @counter)

            SELECT @NumRows = (Select ISNULL(NumRows,0) from #tmp_Transaction2 A JOIN #Filegroup_Name B ON A.infodatetime_Transaction = B.FileGroup and B.ID = @counter)

                        SELECT @SQL = N'ALTER DATABASE '+@DBName+'

                        ADD FILE     

                                (NAME = N'''+@FGName+''',

                                    FILENAME = N'''+@PATH+@FGName+'.ndf'',

                                    SIZE = '+@SIZE+',

                          FILEGROWTH = '+@FILEGROWTH+')

                        TO FILEGROUP ['+@FGName+']'

                        SET @ErrorCode = @@Error

                  END

                  IF @ErrorCode = 0 BEGIN

                        EXECUTE (@SQL);

--select (@SQL)

      SELECT @FGName as FileGroupName, @Size as Size_of_File_MB, @FileGrowth as FileGrowth_Size_MB, @NumRows as NumberOfRows

                        SET @ErrorCode = @@Error

                  END

                  IF @ErrorCode = 0 BEGIN

                        SELECT @counter = @counter + 1

                        SET @ErrorCode = @@Error

                  END

      END 

 

O.K. Believe it or not, that’s the easy part. Once the files were sized correctly and laid down, it’s a matter of creating a table on your partition scheme, in the files you just laid down. Once that’s accomplished I typically write a loop that will then populate the newly partitioned table from the transactional table so as to limit the amount of locking required to read from the production / transactional table. 

In the case of the partitioning excercise that I mentioned above, we chose a sliding window scheme.  That’s important to note because once we populated most of the data, we didn’t insert the current day’s data.  The script (not provided here) then created all of the non-clustered indexes we would need in an aligned fashion.  Then we populated the final day of data, re-named the two tables involved (old huge table to Tablename_OLD and newly partitioned table to Tablename).  By wrapping that in a transaction, we minimized the potential for a very long running insertion process and kept the system online.

Viola…!  Sounds easy.  It is with plenty of thought and testing. 

Final thought… and please switch gears with me just a bit… there’s a feature that I desperately want added to partitioning that you can vote on!  Please see:

https://connect.microsoft.com/SQLServer/feedback/details/468517/update-statistics-at-the-partition-level

Happy partitioning folks and Ron, thank you for one of the best resources I’ve ever read regarding partitioning… I truly wish this chapter had been written a long time a go before I took my first leap into partitioning.

27
Aug
10

Chapter 30 – Reusing space in a table

It’s rare that I read the first paragraph in a chapter of a technical book and smile with fond memories. 

This is one of those rare times.  If you have the book SQL Server MVP Deep dives, written by Joe Webb (Blog | Twitter) then you know what I’m talking about.  If you don’t have the book and you are reading this… well, uh, go.  get the book.  Here. Today. 

This is a cool chapter… while I have only used the functionality provided by this on a couple of occasions, I can attest to its value.  I can especially attest to its value for some C.O.T.S. packages out there that update their schema with a version upgrade and the space is not readily re-used.  I’m not going to name names because that’s not the point… the point is that if you are a developer for some of the C.O.T.S. implementations out there; please keep this in mind.  It’s an easy, easy thing to be on the lookout for and it will keep grumpy DBA’s such as myself from calling you.

More to the point, if you use SQL Server to store data in some of the larger data types (xml, text, image, etc…) and you have data that falls outside of the IN_ROW_DATA allocation space, this chapter and understanding how SQL Server cleans up and doesn’t clean up after itself is very, very important.

The only other item that I think bears a mention is that this will lock your table while this process is running.  That’s a pretty important thing to keep in mind if you are thinking about running this command against your Transactions table in production.  :)

Lastly, for those of you who haven’t frequented Joe’s blog, you should.  I spent about 45 minutes on it while trying to write this little post… he has some great content!  http://webbtechsolutions.com/blog/

20
Aug
10

Chapter 29 – My favorite DMV’s and why

This chapter is awesome.  Aaron Bertrand (Blog | Twitter) takes the time to explain and provide code examples for 15 of the DMV’s introduced in SQL Server 2005 and SQL Server 2008. 

I’ve been using DMV’s for a while now and the amount of information you can glean from them about your environment is incredible.  Coupled with the prior chapter on DDL and DML auditing, Microsoft is doing an excellent job with providing more information, in a much more intuitive fashion than I can remember since SQL Server 6.5.  This is a very, very welcome change as it allows DBA’s, developers and the like to have visibility into the HOW, the WHAT, the WHERE, the WHY and the WHO is doing what inside of your databases.

One of my personal favorite DMV’s is sys.dm_exec_sql_text.  The reason that I love this one so much is that it’s typically used via a cross join to other DMV’s such as sys.dm_exec_query_stats.  The DMV sys.dm_exec_query_stats provides statistics around query plans.  Specifically, how many times that plan has been executed, the cost of the plan and then you can cross join that plan to actually SEE the t-sql that’s being run.  This is incredibly valuable when looking to see if you have a lot of re-compiling going on or a lot of dynamic t-sql or linq-to-sql that’s not reusing query plans.  Addressing some of the big hitters here (from either a cost perspective or a quantity of re-compiles perspective) can significantly alter your overall database / server performance.

Aaron goes on this chapter to provide a few full-blown examples of his favorite DMV’s in use and why he likes them.  He provides all of the code necessary for an improved sp_who2, how to get statistics for stored procedures, finding unused stored procedures, finding inefficient and unused indexes, finding inefficient queries and finding missing indexes.  

An outstanding chapter Aaron, thank you.




Chapters

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 8 other followers


Follow

Get every new post delivered to your Inbox.