Over the past year I have had the opportunity to get deeply involved in database partitioning while developing a reporting system against my companies OLTP database. As my data warehouse grew I quickly realized the benefits table partitioning could offer me. I appreciate SQL Perspectives allowing me the opportunity to post some of my thoughts on the benefits of partitioning. I enjoyed reading this chapter and I appreciate how Ron Talmage approached this subject. Partitioning can be a bear of a topic and it is best approached dissected.
Ron highlights very well the benefit of metadata operations found in partitioning. It is summed up best on page 419:
The Key: Avoid Data Movement
“The key to loading and removing data quickly from partitioned tables is to use the metadata operations of SWITCH, SPLIT, and MERGE without incurring any data movement”.
Any seasoned DBA will tell you what a nightmare it is to transfer large volumes of data between different tables. “Metadata Operation” suddenly becomes 2 very beautiful words to a person confronted with this task. Taking advantage of metadata operations in table partitioning is huge. Below is summary of the definitions from the chapter.
SWITCH
- 100% metadata operation, one side of the switch will always will be empty.
SPLIT
- Safe if using to create new partitions which are empty. Split can be dangerous if splitting a partition loaded with data because some of the data may move into a new partition depending on the range.
MERGE
- Safe if you merge an empty partition with a partition full of data. However, if merging with another partition already filled with data you are asking for a possible very I/O intensive operation.
The above operations give us lots of power, but can quickly turn very dangerous. Remember, with great power comes great responsibility. The truth of the matter is table partitioning is a very enterprise level tool for use when designing a scalable database. If you are going to use it you must know what you are doing. Proper testing is the only way to ensure a beneficial increase of performance.
It is important to understand that when a partition is added it creates a schema lock on the table. Even a query used with a nolock hint is going to be blocked. In certain circumstances the nolock query will actually cause the partition creation to fail. When this happens all of the benefits of partitioning go out the window. To rectify the problem you have to use the SPLIT operation. And as we discussed earlier since the partition already has data in it this can be a very expensive operation. Something that I have found that helps with this problem is the SET DEADLOCK_PRIORITY HIGH tsql statement. This lets SQL Server know that this is a very important operation and should be the last to be considered for a deadlock victim. I just know the last thing I want to see on my system are partitions failing to create and I want to make sure I do everything to not let that happen. The recovery costs (especially if it’s not caught right away) can be disastrous. After we implemented this in our production system we rarely have deadlock problems on the creation of a partition.
When designing a data warehouse from scratch there is a systematic process that, if followed allows the database to have integrity, appropriate normalization/denormalization, and proper index creation for the foreseeable queries that will run against it. A couple of years ago my company assigned me the task of re-architecting our BI reporting system. Using the above mentioned process I created what I thought was a very manageable and yes, very scalable reporting system. My reporting database hummed along for about 8 months without issue other than the size of the database continuing to grow and grow, which is a good thing, right? It was shortly after this when a couple of the tables reached nearly 500 gigs and a few months later, as we brought on more customers, one of them grew nearly a terabyte in size. It was at this point I became aware of a painful lesson… my awesome reporting system had grown out of control and I realized how non-scalable the database actually was. This was also about the time (as is always the case) management had learned to love and depend on the “new fabulous reports the data team created” that were based off of my reporting system. Simply put, right or wrong the reporting system was there to stay and I had to make it work.
Options? I quickly went through all of my procedures and tables, tuning and optimizing where I could. I did achieve some performance gains but not enough to make me comfortable. I needed more hardware resources on the box or simply scale out the data. Obviously management doesn’t like the idea of always throwing additional hardware at production problems. Sometimes it is necessary but you should always look at your process and see if it isn’t something the dba/developer can optimize him/herself.
Our data team at my company has a saying “Less data means less problems”. There is a need to scale up the system by breaking up the data into smaller, more manageable chunks; which in turn leaves the queries with less data to sift through and improves their performance. I decided that I could either implement a version of my own partitioning by breaking up the data into smaller tables or take advantage of the SQL Server built-in partitioning capabilities. After taking Kimberly Tripp’s database partitioning session at PASS I was obviously motivated to go the SQL Server route.
I spent about a week of serious research and dev time reading white papers and any good material on partitioning best practices. I then took a crack at redesigning the database schema to incorporate partitioned tables. I saw huge performance increases such as queries finishing in a fraction of the time. However, these performance gains did not come initially. This leads me to the main point of this post. There are about a million different ways to set up SQL Server table partitioning incorrectly and only a few ways to do it right. Every step has to be done in the proper order to get the maximum benefit. The journey is tedious and as Ron mentioned on page 413 “table partitioning is a very involved topic and can quickly become complex”. However, when implemented correctly it is the difference between an awesome and scalable system and one that simply, in the end, disappoints.
My experience taught me two facts:
- Always align your indexes with the partition. It is truly an insane world when someone over looks this step. An index that does not know about its partition is like a $100 bill stuck underneath the seat of your car. It has the potential to do a lot for you but it goes unused.
- Creating multiple filegroups multiplies the benefits of partitioning. Even though it is easier don’t just stick all of your data on the primary filegroup. You give up so much when this happens.
Whenever I try to learn a new concept I try to find the most basic, yet complete example of the topic at hand. Below I have listed the steps in order to create a working partition.
Below are the definitions and then the practical example.
Create Filegroup(s)
- In this step we create the logical filegroup. Using multiple filegroups with partitions is best for implementing table partitioning. Filegroups, for example allow you to span the table on different physical disks. Even in a VM environment you benefit from the use of multithreading, assuming multiple CPU’s are available to the OS.
Map the filegroup to a physical location
- In this step the logical file group is physically mapped to a specific database and file location.
Create Partition Function
- Defines the partitions boundaries or ranges. Has a one to many relationship with the partition scheme.
Create Partition Scheme
- Defines on which filegroups the table’s multiple partitions will be stored.
Create Table on Partition Scheme
- In order for a table to be partitioned it has to be created on a particular partition Scheme. In this step it is important to align the primary key of the table with the partition key.
Code Examples
Create Filegroup(s)
ALTER DATABASE DatabaseName ADD FILEGROUP FG_FirstMonth ALTER DATABASE DatabaseName ADD FILEGROUP FG_SecondMonth
Map the filegroup to a physical location
ALTER DATABASE DatabaseName add FILE (NAME = N'FG_FirstMonth’, FILENAME = N'C:\PartitionFiles\MSSQL\DATA\FirstMonth.ndf', TO FILEGROUP [FG_FirstMonth] ALTER DATABASE DatabaseName add FILE (NAME = N'FG_SecondMonth’, FILENAME = N'C:\PartitionFiles\MSSQL\DATA\SecondMonth.ndf', TO FILEGROUP [FG_SecondMonth]
Create Partition Function
Create PARTITION FUNCTION [MonthlyDateRange] (int) AS RANGE Left FOR VALUES ('1', '2')
Create Partition Scheme
Create PARTITION SCHEME [MonthScheme] AS PARTITION [MonthlyDateRange] TO (FG_FirstMonth, FG_SecondMonth, [primary])
Create Table on Partition Scheme
CREATE TABLE [dbo]. Sales [RowID] [int] IDENTITY(1,1) NOT NULL, [ProductName] [int] NOT NULL, [CustomerName] NOT NULL, [SalesMonth] [int] NOT NULL, CONSTRAINT [PK_Sales] PRIMARY KEY CLUSTERED ([RowID] ASC, [SalesMonth] ASC [MonthScheme]([SalesMonth])) ON [MonthScheme]([SalesMonth])
Thanks for the great chapter Ron.
My name is Dustin Brown I have been working with SQL Server for the last 6 years. When I am not developing databases I enjoy spending time with my family and living the good life with my kids. I live and work in Sonora, CA with my wife and 4 children. Please drop me a line at dustinkbrown@gmail.com