Archive Page 2

02
Sep
10

Chapter 31 — Practical issues in table partitioning

Over the past year I have had the opportunity to get deeply involved in database partitioning while developing a reporting system against my companies OLTP database.  As my data warehouse grew I quickly realized the benefits table partitioning could offer me. I appreciate SQL Perspectives allowing me the opportunity to post some of my thoughts on the benefits of partitioning. I enjoyed reading this chapter and I appreciate how Ron Talmage approached this subject. Partitioning can be a bear of a topic and it is best approached dissected.   

Ron highlights very well the benefit of metadata operations found in partitioning. It is summed up best on page 419:   

The Key: Avoid Data Movement    

“The key to loading and removing data quickly from partitioned tables is to use the metadata operations of SWITCH, SPLIT, and MERGE without incurring any data movement”.   

 Any seasoned DBA will tell you what a nightmare it is to transfer large volumes of data between different tables. “Metadata Operation” suddenly becomes 2 very beautiful words to a person confronted with this task. Taking advantage of metadata operations in table partitioning is huge. Below is summary of the definitions from the chapter.   

SWITCH   

  •  100% metadata operation, one side of the switch will always will be empty.

SPLIT    

  • Safe if using to create new partitions which are empty. Split can be dangerous if splitting a partition loaded with data because some of the data may move into a new partition depending on the range.

MERGE   

  • Safe if you merge an empty partition with a partition full of data. However, if merging with another partition already filled with data you are asking for a possible very I/O intensive operation.

    

The above operations give us lots of power, but can quickly turn very dangerous. Remember, with great power comes great responsibility. The truth of the matter is table partitioning is a very enterprise level tool for use when designing a scalable database. If you are going to use it you must know what you are doing. Proper testing is the only way to ensure a beneficial increase of performance.   

It is important to understand that when a partition is added it creates a schema lock on the table. Even a query used with a nolock hint is going to be blocked. In certain circumstances the nolock query will actually cause the partition creation to fail. When this happens all of the benefits of partitioning go out the window. To rectify the problem you have to use  the SPLIT operation. And as we discussed earlier since the partition already has data in it this can be a very expensive operation. Something that I have found that helps with this problem is the SET DEADLOCK_PRIORITY HIGH tsql statement. This lets SQL  Server know that this is a very important operation and should be the last to be considered for a deadlock victim. I just know the last thing I want to see on my system are partitions failing to create and I want to make sure I do everything to not let that happen. The recovery costs (especially if it’s not caught right away) can be disastrous. After we implemented this in our production system we rarely have deadlock problems on the creation of a partition.   

When designing a data warehouse from scratch there is a systematic process that, if followed allows the database to have integrity, appropriate normalization/denormalization, and proper index creation for the foreseeable queries that will run against it. A couple of years ago my company assigned me the task of re-architecting our BI reporting system. Using the above mentioned process I created what I thought was a very manageable and yes, very scalable reporting system. My reporting database hummed along for about 8 months without issue other than the size of the database continuing to grow and grow, which is a good thing, right? It was shortly after this when a couple of the tables reached nearly 500 gigs and a few months later, as we brought on more customers, one of them grew nearly a terabyte in size. It was at this point I became aware of a painful lesson… my awesome reporting system had grown out of control and I realized how non-scalable the database actually was. This was also about the time (as is always the case) management had learned to love and depend on the “new fabulous reports the data team created” that were based off of my reporting system. Simply put, right or wrong the reporting system was there to stay and I had to make it work.   

Options? I quickly went through all of my procedures and tables, tuning and optimizing where I could. I did achieve some performance gains but not enough to make me comfortable. I needed more hardware resources on the box or simply scale out the data. Obviously management doesn’t like the idea of always throwing additional hardware at production problems.  Sometimes it is necessary but you should always look at your process and see if it isn’t something the dba/developer can optimize him/herself.   

Our data team at my company has a saying “Less data means less problems”.  There is a need to scale up the system by breaking up the data into smaller, more manageable chunks; which in turn leaves the queries with less data to sift through and improves their performance. I decided that I could either implement a version of my own partitioning by breaking up the data into smaller tables or take advantage of the SQL Server built-in partitioning capabilities. After taking Kimberly Tripp’s database partitioning session at PASS I was obviously motivated to go the SQL Server route.   

I spent about a week of serious research and dev time reading white papers and any good material on partitioning best practices. I then took a crack at redesigning the database schema to incorporate partitioned tables.  I saw huge performance increases such as queries finishing in a fraction of the time. However, these performance gains did not come initially. This leads me to the main point of this post.  There are about a million different ways to set up SQL Server table partitioning incorrectly and only a few ways to do it right.  Every step has to be done in the proper order to get the maximum benefit. The journey is tedious and as Ron mentioned on page 413 “table partitioning is a very involved topic and can quickly become complex”. However, when implemented correctly it is the difference between an awesome and scalable system and one that simply, in the end, disappoints.   

My experience taught me two facts:   

  1. Always align your indexes with the partition. It is truly an insane world when someone over looks this step. An index that does not know about its partition is like a $100 bill stuck underneath the seat of your car. It has the potential to do a lot for you but it goes unused.
  2.  Creating multiple filegroups multiplies the benefits of partitioning. Even though it is easier don’t just stick all of your data on the primary filegroup. You give up so much when this happens.

    

Whenever I try to learn a new concept I try to find the most basic, yet complete example of the topic at hand. Below I have listed the steps in order to create a working partition.   

Below are the definitions and then the practical example.   

Create Filegroup(s)   

  • In this step we create the logical filegroup. Using multiple filegroups with partitions is best for implementing table partitioning. Filegroups, for example allow you to span the table on different physical disks. Even in a VM environment you benefit from the use of multithreading, assuming multiple CPU’s are available to the OS.

Map the filegroup to a physical location   

  • In this step the logical file group is physically mapped to a specific database and file location.

    

Create Partition Function   

  • Defines the partitions boundaries or ranges.  Has a one to many relationship with the partition scheme.

Create Partition Scheme    

  • Defines on which filegroups the table’s multiple partitions will be stored.

Create Table on Partition Scheme   

  • In order for a table to be partitioned it has to be created on a particular partition Scheme. In this step it is important to align the primary key of the table with the partition key.

Code Examples   

Create Filegroup(s)   

 
ALTER DATABASE DatabaseName ADD FILEGROUP FG_FirstMonth
ALTER DATABASE DatabaseName ADD FILEGROUP FG_SecondMonth

Map the filegroup to a physical location   

 
ALTER DATABASE DatabaseName add FILE (NAME = N'FG_FirstMonth’,   

FILENAME = N'C:\PartitionFiles\MSSQL\DATA\FirstMonth.ndf',   

TO FILEGROUP [FG_FirstMonth]
 

ALTER DATABASE DatabaseName add FILE (NAME = N'FG_SecondMonth’,   

FILENAME = N'C:\PartitionFiles\MSSQL\DATA\SecondMonth.ndf',   

TO FILEGROUP [FG_SecondMonth]

Create Partition Function   

 
Create  PARTITION FUNCTION [MonthlyDateRange] (int)

AS RANGE Left FOR VALUES ('1', '2')

Create Partition Scheme    

 
Create PARTITION SCHEME [MonthScheme]

AS PARTITION [MonthlyDateRange]

TO (FG_FirstMonth, FG_SecondMonth, [primary])

Create Table on Partition Scheme   

 
CREATE TABLE [dbo]. Sales

[RowID] [int] IDENTITY(1,1) NOT NULL,

[ProductName] [int] NOT NULL,

[CustomerName] NOT NULL,

[SalesMonth] [int] NOT NULL,

CONSTRAINT [PK_Sales] PRIMARY KEY CLUSTERED

([RowID] ASC,

[SalesMonth] ASC

[MonthScheme]([SalesMonth]))

ON [MonthScheme]([SalesMonth])

Thanks for the great chapter Ron.   

My name is Dustin Brown I have been working with SQL Server for the last 6 years. When I am not developing databases I enjoy spending time with my family and living the good life with my kids.  I live and work in Sonora, CA with my wife and 4 children. Please drop me a line at dustinkbrown@gmail.com

01
Sep
10

SQL Server can’t do that, we have to buy Oracle.

 So we all like to think, OK maybe it’s just me. I hope it’s not just me, but I can tell you this topic comes up a lot. Maybe I should back up and start again…

 Over the last year or so, I have been on a number of contracts, these often start with a list of tasks such as our performance is not what we expected, or we need to add this new feature. But often many people come back to a question that I am not sure why but it really surprised me. The question was can SQL Server handle all my data? The funny thing is that these are not clients that are on huge systems, they may have a few hundred gigs, but because of poor design, lack of indexes or ignoring maintenance on indexes all together. A month or two ago I was in a session that Kevin Cox did where he was talking about some of the 50+ TB systems that he has worked on. So I revert back to the question that I have been asked…

Can SQL Server handle all this data?

Well it is my opinion that it can without a doubt. There is a number of ways that this can be done, but a popular way today that is built into SQL Server is Table Partitions. I know that if you are not familiar with the term or if you have never done this before that it can appear to be a bit scary. Not like bad slasher flick scary, the kind of scary that would keep you up all night fighting for your job because something did not work like you had planned it to. So how do we take the scare out of table partitions? Another easy question…

Chapter 31 of the SQL Server MVP Deep Dives book. The chapter by Ron Talmage is going to point you in the right direction. The chapter in my opinion is clear and concise. Ron takes you from the basics of learning the terms that you need to know all the way up to loading your table and then some. Ron does admit that there just was not enough space to cover the whole topic. But he does give you a list of topics to research and I bet many of them appear in the next chapter. I do question if we have this in the right section, I would think we would want to talk about partitions in the database design section, but hey all that matters is it is in the book.

The point of this post is that when you are asked the question “Can SQL Server do this?” We really need to take a lot of time and figure out exactly what they want done. If someone has a bad design in SQL Server, and they take that bad design over to Oracle it is still a bad design. Learning the tools that you have at your finger tips and making sure that you use all the tools you have is the key to being able to say…

“Heck ya it can, watch this”

If you have not learned yet one of my fellow bloggers is a huge fan of partitioning and spends a number of hours working on and helping companies implement it. A while back he put in a feature add with Microsoft and well, I think it would be a good add to the feature. If you have time, you could make Jeremy happy by voting on this so I don’t have to hear about it any longer. Ron, I thank you for another great chapter in the book that I think has become the best SQL Server book out in years.

https://connect.microsoft.com/SQLServer/feedback/details/468517/update-statistics-at-the-partition-level

01
Sep
10

Chapter 31 – Some practical issues in table partitioning

This chapter is incredibly relevant in my world at the moment.  I’ve helped several clients implement table partitioning and just recently I was on-site at a client at 2:00AM to lay down the new partition scheme and migrate / move all of the data into the partition.  A relatively simple task with a lot of planning and testing… the most interesting part of this implementation was to make this all occur with zero time-outs on their system.  In fact, not only were time-outs not allowable but transactions never took more than three seconds during the operation. 

In this chapter, Ron Talmage provides an excellent baseline introduction to partitioning.  He begins the chapter by stating “Table partitioning is an involved topic and can quickly become complex…”.  At the end of the chapter he states “Table partitioning is quite an involved topic…”.  I appreciate that Ron pointed out that this isn’t some old-hat, do it in your sleep type of feature or implementation in SQL Server.  

In my experience, partitioning is an excellent tool for obtaining database scalability and manageability.  It can also be implemented incorrectly and that can actually de-grade your performance and cause a management nightmare.  I’ve been involved in both scenarios and this chapter begins to lay down the knowledge you need in order to correctly implement table partitioning. 

One of the coolest things about partitions, in my opinion is the ability to have very specific control over the size of your data files.  This can be very, very helpful when you anticipate a need for significant growth or when you first implement table partitioning.  At the beginning of this post, I mentioned that I was just involved in a partitioning excercise that was accomplished against a table while the system was fully on-line, available and being used.  There are a number of steps associated with successfully doing this and the information below will show how this was accomplished (at a high level). 

First, choosing the correct partition scheme is absolutely critical.  There are additional sources that Ron pointed out in his chapter that I’d recommend you read up on before implementing partitioning.  

Once we had identified the correct partition key (in this case a datetime field); the partition function was created as a sliding window (more info on that next week).  Now, we have to lay down the files (by partition function) and populate them.  This is where it gets fun.  The t-sql below is where I begin with this effort.  In short, this script will figure out what size the partition files should be, based on your partitionkey (in this case it’s the datetime_column from the Your_Table table, create the filegroups and then lay down the files you need; sized accordingly.

  USE <YourDBName>

GO

SET NOCOUNT ON

select COUNT(*) as numrows,

            'Y'+ CAST(DATEPART(yyyy,datetime_column) as varchar(4)) + 'Q' + CAST(DATEPART(q,datetime_column) as varchar(2))+ 'FG' as datetime,

            CAST(null as decimal(9,2)) as numrowdate, CAST(null as decimal(9,2)) as percentage,

            CAST(null as decimal(9,2)) as currentsize, CAST(null as decimal(9,2)) as partitionsize

into #tmp_Transaction

FROM [Your_Table]

group by

            'Y'+ CAST(DATEPART(yyyy,datetime_column) as varchar(4)) + 'Q' + CAST(DATEPART(q,datetime_column) as varchar(2))+ 'FG'

go

update #tmp_Transaction set numrowdate = (select SUM(numrows)     from #tmp_Transaction)

go

update #tmp_Transaction set percentage = b.total

      from

       (select numrows / numrowdate as total, datetime_column from #tmp_Transaction) b where #tmp_Transaction.datetime_column = b.datetime_column

go

create table #space_Transaction

      (name varchar(255), rows int, reserved varchar(55), data varchar(55), index_size varchar(55), unused varchar(55))

insert #space_Transaction

exec sp_spaceused [transaction]

go

update #space_Transaction set reserved = REPLACE(reserved,' kb', '')

go

update #tmp_Transaction set currentsize = (select CAST(reserved as decimal(9,2)) from #space_Transaction)

go

update #tmp_Transaction set currentsize = isnull(currentsize,0) , partitionsize = isnull(partitionsize,0)

GO

 -- ** NOTE **  I've multiplied the size by 10 percent; change this to 1.0 if you want it to be exact.

select      datetime as infodatetime_Transaction,

            cast((isnull(currentsize, 150) * 1.1 / 1024) * percentage  as int) as partitionsize_Transaction,

            cast((isnull(currentsize, 150) * 1.1 / 1024) * percentage *.1 as int) as partitionsize_Transaction_growth,

            numrows

INTO #tmp_Transaction2

FROM #tmp_Transaction

create table #filegroup_name (id int identity(1,1), FileGroup varchar(22))

insert #filegroup_name (FileGroup)

values ('Y2009Q1FG')

insert #filegroup_name (FileGroup)

values ('Y2009Q2FG')

insert #filegroup_name (FileGroup)

values ('Y2009Q3FG')

insert #filegroup_name (FileGroup)

values ('Y2009Q4FG')

insert #filegroup_name (FileGroup)

values ('Y2010Q1FG')

insert #filegroup_name (FileGroup)

values ('Y2010Q2FG')

insert #filegroup_name (FileGroup)

values ('Y2010Q3FG')

insert #filegroup_name (FileGroup)

values ('Y2010Q4FG')

insert #filegroup_name (FileGroup)

values ('Y2011Q1FG')

insert #filegroup_name (FileGroup)

values ('Y2011Q2FG')

insert #filegroup_name (FileGroup)

values ('Y2011Q3FG')

insert #filegroup_name (FileGroup)

values ('Y2011Q4FG')

DECLARE @SIZE VARCHAR(20)

DECLARE @FILEGROWTH VARCHAR(20)

DECLARE @PATH NVARCHAR(255)

DECLARE @DBName NVARCHAR(255)

SET @PATH = 'C:\Partitioning_Files\'

SET @DBName = DB_NAME()

      DECLARE  @SQL varchar(4000)

                  , @counter int

                  , @ErrorCode integer

                  , @FGName varchar(100)

                  , @NumRows int

SET @ErrorCode = 0

                  SELECT @counter = 1

            WHILE @counter <= 12 AND @ErrorCode = 0 BEGIN

                  SELECT @FGName = (Select FileGroup from #filegroup_name where @counter = ID)

                  SET @ErrorCode = @@Error

                  IF @ErrorCode = 0 BEGIN

                        SELECT @SQL = N'ALTER DATABASE '+ @DBName + ' ADD FILEGROUP ['+@FGName+']'

                        SET @ErrorCode = @@Error

                  END

                  IF @ErrorCode = 0 BEGIN

--select @sql

                        EXECUTE (@SQL);

                        SET @ErrorCode = @@Error

                  END

                  IF @ErrorCode = 0 BEGIN

            SELECT @Size = (Select ISNULL(Partitionsize_Transaction,150) from #tmp_Transaction2 A JOIN #Filegroup_Name B ON A.infodatetime_Transaction = B.FileGroup and B.ID = @counter)

            SELECT @FileGrowth = (Select ISNULL(Partitionsize_Transaction_growth,150) from #tmp_Transaction2 A JOIN #Filegroup_Name B ON A.infodatetime_Transaction = B.FileGroup and B.ID = @counter)

            SELECT @NumRows = (Select ISNULL(NumRows,0) from #tmp_Transaction2 A JOIN #Filegroup_Name B ON A.infodatetime_Transaction = B.FileGroup and B.ID = @counter)

                        SELECT @SQL = N'ALTER DATABASE '+@DBName+'

                        ADD FILE     

                                (NAME = N'''+@FGName+''',

                                    FILENAME = N'''+@PATH+@FGName+'.ndf'',

                                    SIZE = '+@SIZE+',

                          FILEGROWTH = '+@FILEGROWTH+')

                        TO FILEGROUP ['+@FGName+']'

                        SET @ErrorCode = @@Error

                  END

                  IF @ErrorCode = 0 BEGIN

                        EXECUTE (@SQL);

--select (@SQL)

      SELECT @FGName as FileGroupName, @Size as Size_of_File_MB, @FileGrowth as FileGrowth_Size_MB, @NumRows as NumberOfRows

                        SET @ErrorCode = @@Error

                  END

                  IF @ErrorCode = 0 BEGIN

                        SELECT @counter = @counter + 1

                        SET @ErrorCode = @@Error

                  END

      END 

 

O.K. Believe it or not, that’s the easy part. Once the files were sized correctly and laid down, it’s a matter of creating a table on your partition scheme, in the files you just laid down. Once that’s accomplished I typically write a loop that will then populate the newly partitioned table from the transactional table so as to limit the amount of locking required to read from the production / transactional table. 

In the case of the partitioning excercise that I mentioned above, we chose a sliding window scheme.  That’s important to note because once we populated most of the data, we didn’t insert the current day’s data.  The script (not provided here) then created all of the non-clustered indexes we would need in an aligned fashion.  Then we populated the final day of data, re-named the two tables involved (old huge table to Tablename_OLD and newly partitioned table to Tablename).  By wrapping that in a transaction, we minimized the potential for a very long running insertion process and kept the system online.

Viola…!  Sounds easy.  It is with plenty of thought and testing. 

Final thought… and please switch gears with me just a bit… there’s a feature that I desperately want added to partitioning that you can vote on!  Please see:

https://connect.microsoft.com/SQLServer/feedback/details/468517/update-statistics-at-the-partition-level

Happy partitioning folks and Ron, thank you for one of the best resources I’ve ever read regarding partitioning… I truly wish this chapter had been written a long time a go before I took my first leap into partitioning.

27
Aug
10

Something You can Use

I don’t want this to sound like it’s all about me. The end result is that it isn’t, I hope that when you read this that really I am trying to make it about you, something you can use or something that you may need to use in the future. I think about it this way, when I go to a conference there are so many excellent sessions that I would like to attend, some of these sessions are really deep and some are the overview. By default I want to look at sessions that I really think I am going to learn a lot, that is the whole point right? But then I have to come back to earth and realize that not only does it have to be something that I want to learn but something I think I can use soon. So what does this mean for you?

Each DBA at some point in time is going to have to deal with a space issue, and before you can do that you really need to sit down and free up as much space as you can to make sure you know what you are working with. There are issues here with just freeing space that you may want to consider, and Joe talks about these in his chapter. The end result is this is a chapter you should read and understand so when you are working with your databases, you know how to reclaim and reuse the space that you can.




Chapters

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 10 other subscribers