Friday, September 7, 2007

Partitioned Tables in SQL Server 2005

Partitioned Tables in SQL Server 2005

Partitioned tables are a new feature available in SQL Server version 2005, aimed mainly at improving the performance of large database systems. The feature is only available for enterprise and developer edition. For other editions you can get a similar functionality with a partitioned view.

This article focuses on how to create a partitioned table and manipulate the partitions, rather than exploring the performance aspects.

Creating the Partitioned Table

First of all, we will create a partitioned table and prove that it is acting in a partitioned manner, in that the queries will only access the partitions that are required.

To start with, we need to create a partition function. This will define how many partitions exist, and the values contained in partition columns within those partitions

CREATE PARTITION FUNCTION MyPartitionRange (INT) 
AS RANGE LEFT FOR VALUES (1,2)

The partition function has been named MyPartitionRange.

The partition column is an int.

'Range left' means that the value is the upper bound for the partition.

The above code will define a partitioned table that contains 3 partitions.


  • Partition 1 – Partition value <= 1
  • Partition 2 – Partition value > 1 and <= 2
  • Partition 3 – Partition value > 2

As the partition column is an integer the partitions will actually be


  • Partition 1 – Partition value <= 1
  • Partition 2 – Partition value =2
  • Partition 3 – Partition value > 2

Note: that the range left / range right determines the partition for data that matches the partition literal value. For integer data this would change the partition number for all the data .

Now we create a partition scheme.

CREATE PARTITION SCHEME MyPartitionScheme AS 
PARTITION MyPartitionRange
ALL TO ([PRIMARY])

The partition scheme is named MyPartitionScheme and references the partition function MyPartitionRange. All of the partitions are to be held on the primary filegroup.

Now we can create the table.

All that is needed is to reference the partition scheme naming the partition column in an “on” clause

CREATE TABLE MyPartitionedTable 
(
i INT ,
s CHAR(8000) ,
PartCol INT
)
ON
MyPartitionScheme (PartCol)

We can check the table structure via…

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')

…which gives…

partition_id         object_id index_id  partition_number   hobt_id            rows
------------------ ----------- --------- ---------------- ------------------ ----
72057594038714368 229575856 0 1 72057594038714368 0
72057594038779904 229575856 0 2 72057594038779904 0
72057594038845440 229575856 0 3 72057594038845440 0

Now we add some data

INSERT MyPartitionedTable (i, s, PartCol) SELECT 1, 'a', 1
INSERT MyPartitionedTable (i, s, PartCol) SELECT 2, 'a', 2
INSERT MyPartitionedTable (i, s, PartCol) SELECT 3, 'a', 2
INSERT MyPartitionedTable (i, s, PartCol) SELECT 4, 'a', 3
INSERT MyPartitionedTable (i, s, PartCol) SELECT 5, 'a', 3
INSERT MyPartitionedTable (i, s, PartCol) SELECT 6, 'a', 3
INSERT MyPartitionedTable (i, s, PartCol) SELECT 7, 'a', 4

... and check that the rows have been added to the correct partitions

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')

… giving as expected…

partition_id        object_id index_id    partition_number hobt_id              rows
------------------- --------- ----------- ---------------- -------------------- ----
72057594038714368 229575856 0 1 72057594038714368 1
72057594038779904 229575856 0 2 72057594038779904 2
72057594038845440 229575856 0 3 72057594038845440 4

A function, $partition, is available to give the partition number for the data.

SELECT PartitionNo = $partition.MyPartitionRange(PartCol), NumRows = COUNT(*)
FROM MyPartitionedTable
GROUP BY $partition.MyPartitionRange(PartCol)
ORDER BY $partition.MyPartitionRange(PartCol)

The following…

SELECT PartitionNo = $partition.MyPartitionRange(6)

…is a valid statement and shows that a row with partition value 6 would be added to partition 3.

The function gives the row counts for each partition and so can be useful to find which partition holds the data.

Now to test that queries only access the required partition. We make partition 2 very large (this is why we made s a char(8000)).

DECLARE @i INT 
SELECT @i = 13
WHILE @i > 0
BEGIN
SELECT @i = @i - 1

INSERT MyPartitionedTable (i, s, PartCol)
SELECT i, s, PartCol
FROM MyPartitionedTable
WHERE PartCol = 2
END

...and we check the rowcounts ...

SELECT * 
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')

…giving as expected

/*
partition_id object_id index_id partition_number hobt_id rows
------------------- ----------- --------- ---------------- -------------------- -----
72057594041532416 2117582582 0 1 72057594041532416 1
72057594041597952 2117582582 0 2 72057594041597952 16384
72057594041663488 2117582582 0 3 72057594041663488 4
*/

On my laptop that is enough to give an appreciable difference in query times – if the first query below is too quick to detect on your machine then increase the number of loops to add rows to the partition.

Now try the queries

SELECT COUNT(DISTINCT s) FROM MyPartitionedTable WHERE PartCol = 2
SELECT COUNT(DISTINCT s) FROM MyPartitionedTable WHERE PartCol = 1

You should find that the first takes a lot longer than the second because in both cases only one partition is accessed, a single read for partition 1, many reads for partition 2.

Adding and Removing Partitions


In the previous example, we added data to the partition by inserting rows into the partitioned table. It is also possible to populate a table, and then add that table to the partitioned table as a partition. There are many restrictions on the nature of the table and data that can be added.

The command to add the table as a partition is “alter table …. Switch…”. It actually swaps the table with a partition already existing in the partitioned table.

We will now add a new partition to MyPartitionedTable for partition value 3 and then swap it for a new table MyNewPartition.

To add a new partition to MyPartitionedTable, use the split range command on the partition function.

ALTER PARTITION FUNCTION MyPartitionRange () split RANGE (3)

This has added a new partition for partition value 3.

The reverse of this is a merge range statement

ALTER PARTITION FUNCTION MyPartitionRange () merge RANGE (3)
SELECT *
FROM sys.partitions
WHERE
OBJECT_ID = OBJECT_ID('MyPartitionedTable')

/*
partition_id object_id index_id partition_number hobt_id rows
-------------------- ----------- --------- ---------------- -------------------- ----
72057594042056704 322100188 0 1 72057594042056704 1
72057594042122240 322100188 0 2 72057594042122240 2
72057594042187776 322100188 0 4 72057594042187776 1
72057594042253312 322100188 0 3 72057594042253312 3
*/

Note that the existing rows for partition value 3 have been moved to partition 3. The row with partition value 4 has been moved to partition 4.

We create a new table to swap with a partition. This table must have the same structure as the partition. It must also include a check constraint to ensure that the partition column values in the table are included in the correct partition. The check constraint must be at least as restrictive as the partition range function for that partition.

CREATE TABLE MyNewPartition
(
i
INT ,
s CHAR(8000) ,
PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
)

I normally create and populate the table and add the check constraint later.

Now we add some data to the new table

INSERT MyNewPartition SELECT 1, 'a', 3

To perform this operation the partition must be empty so...

DELETE MyPartitionedTable WHERE PartCol = 3

And we can swap the table with the partition

ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Viewing the partitions

SELECT * 
FROM
sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')
/*
partition_id object_id index_id partition_number hobt_id rows
-------------------- ----------- --------- ---------------- -------------------- ----
72057594042056704 322100188 0 1 72057594042056704 1
72057594042122240 322100188 0 2 72057594042122240 2
72057594042187776 322100188 0 4 72057594042187776 1
72057594042318848 322100188 0 3 72057594042318848 1
*/

We see that the partition has been swapped with the new table.

The advantage of this is that the swap does not move the data – it just updates the metadata so that the table becomes the partition. This means that it is very fast. The table can be created, populated, and then added as a partition, thereby causing minimal impact on the partitioned table.

Switching a populated partition


Of course the partition that is being swapped out would often contain a lot of data – usually this would be used for adding another partition to the right to split up the data. Deleting the data from the partition would not be feasible – nor would the split on a populated partition.

To accomplish this, you would need to first populate the two tables to replace the 'catch all' partition (in our scenario partition 3).

Now create a table MyOldPartition3 – remember this must be the same structure as above. As this is a destination for a switch the check constraint must be less restrictive than that on the partitioned table.

Now switch the partition out

ALTER TABLE MyPartitionedTable switch PARTITION
3 TO MyOldPartition3

Now the partition split can be carried out on an empty partition and the two partitions switched in without moving any data.

Identities in a partitioned table


Remember that an identity is not guaranteed to be unique or sequential. It just allocates the next value from the current seed. With that in mind nothing that follows should come as a surprise.

For this we will create a new partition function, scheme and table

CREATE PARTITION FUNCTION MyIdentityPartitionRange (INT)
AS RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyIdentityPartitionScheme AS
PARTITION MyIdentityPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE MyIdentityPartitionedTable
(
i INT IDENTITY (1,1) ,
s CHAR(10)
,
PartCol INT
)
ON MyPartitionScheme (PartCol)

--And we add some data
INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'a', 1
INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'a', 2
INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'b', 1
INSERT MyIdentityPartitionedTable (
s, PartCol) SELECT 'b', 2

SELECT * FROM MyIdentityPartitionedTable

i s PartCol
----------- ---------- -----------
1 a 1
3 b 1
2 a 2
4 b 2

Showing that the identity is a property of the partitioned table rather than the partition.

More interesting is what happens when partitions are swapped:

CREATE TABLE MyIdentityPartitionedTableNew
(
i INT IDENTITY (1,1) ,
s CHAR(10
) ,
PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
)

INSERT MyIdentityPartitionedTableNew (s, PartCol) SELECT 'c', 3
INSERT MyIdentityPartitionedTableNew (s, PartCol) SELECT 'd', 3
ALTER TABLE MyIdentityPartitionedTableNew
switch TO MyIdentityPartitionedTable PARTITION 3


SELECT * FROM MyIdentityPartitionedTable

i s PartCol
----------- ---------- -----------
1 a 1
3 b 1
2 a 2
4 b 2
1 c 3
2 d 3

And we see duplicate identity values from the new partition.

Adding a new row

INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'e', 1
SELECT * FROM MyIdentityPartitionedTable

i s PartCol
----------- ---------- -----------
1 a 1
3 b 1
5 e 1
2 a 2
4 b 2
1 c 3
2 d 3

We see that the partition swap has not affected the identity seed for the table.

Partitioned tables and Indexes


If the index contains the partitioning column then the index is referred to as being 'aligned' with the table.

If the index uses the same partitioning scheme as the table and is in the same filegroup then the index must be aligned with the table.

For a non-clustered non-unique index the partitioning column can be included to align the index rather than being indexed.

I think it is best to always explicitly include the partitioning column in your indexes.

The following assumes a single filegroup and scheme.

Clustered index

As stated this index must be aligned with the table. If it is not then the partitioning column will be implicitly added as the last column of the index.

CREATE PARTITION FUNCTION MyPartitionRange (INT) 
AS RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE MyPartitionedTable
(
i INT ,
j INT ,
s VARCHAR(MAX) ,
PartCol INT

)
ON MyPartitionScheme (PartCol)
CREATE TABLE MyNewPartition
(
i INT ,
j INT ,
s VARCHAR(MAX) ,
PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
)

CREATE CLUSTERED INDEX cl_ix ON MyPartitionedTable (j)
CREATE CLUSTERED INDEX cl_ix ON MyNewPartition (j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

…gives the error…

ALTER TABLE SWITCH statement failed. There is no identical index in source table 'tempdb.dbo.MyNewPartition' for the index 'cl_ix' in target table 'tempdb.dbo.MyPartitionedTable' .

whereas these all succeed.

DROP INDEX MyNewPartition.cl_ix
DROP INDEX MyPartitionedTable.cl_ix
CREATE CLUSTERED INDEX cl_ix ON MyPartitionedTable (j, PartCol)
CREATE CLUSTERED INDEX cl_ix ON MyNewPartition (j, PartCol)
ALTER
TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3
DROP INDEX MyNewPartition.cl_ix
DROP INDEX MyPartitionedTable.cl_ix
CREATE CLUSTERED INDEX cl_ix ON MyPartitionedTable (PartCol, j)
CREATE CLUSTERED indeex cl_ix ON MyNewPartition (PartCol, j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

This will also work

DROP INDEX MyNewPartition.cl_ix
DROP INDEX MyPartitionedTable.cl_ix
CREATE CLUSTERED INDEX cl_ix
ON MyPartitionedTable (j)
CREATE CLUSTERED INDEX cl_ix ON MyNewPartition (j, PartCol)


...showing that the partitioning column has been implicitly added to the clustered index in the partitioned table.

Note that this extra column will not be shown by sp_helpindex on the partitioned table nor by scripting the index but it is shown by sys.index_columns

I don’t know why Microsoft decided to add the column to the index automatically. I think it would be less confusing to give an error, thereby forcing the user to add the column explicitly.

Unique index
Unique indexes must contain the partitioning column as an indexed column.
CREATE PARTITION FUNCTION MyPartitionRange (INT) 
AS RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])

CREATE TABLE MyPartitionedTable
(
i
INT ,
j INT ,
s VARCHAR(MAX) ,
PartCol INT
)
ON MyPartitionScheme (PartCol)

CREATE TABLE MyNewPartition
(
i INT ,
j INT ,
s VARCHAR(MAX) ,
PartCol INT CHECK (PartCol = 3
AND PartCol IS NOT NULL)
)
CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (j)

...gives the error

   Msg 1908, Level 16, State 1, Line 1 olumn 'PartCol' is partitioning 
column of the index 'cl_ix'. Partition columns for a unique index must
be a subset of the index key.

… one of the more explanatory error messages.

The Partitioning column must be part of the index so …

CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (j) include (PartCol) 

…will also fail.

Unlike clustered indexes the partitioning column must be explicitly part of the index.
These will both work

CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (j, PartCol)
CREATE UNIQUE INDEX cl_ix
ON MyPartitionedTable (j, PartCol)

ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3
CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (PartCol , j)
CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (PartCol , j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3


Non-unique index
Non-unique indexes do not need to have the partitioning column as part of the index, it can be an INCLUDE column. If it is not explicitly included, then the column will be automatically added – again this will not appear in sp_helpindex
CREATE PARTITION FUNCTION MyPartitionRange (INT) 
AS
RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE MyPartitionedTable
(
i INT ,
j INT ,
s VARCHAR(MAX) ,
PartCol INT
)
ON MyPartitionScheme (PartCol)

CREATE TABLE MyNewPartition
(
i INT ,
j INT ,
s VARCHAR(MAX) ,
PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
)
CREATE INDEX cl_ix ON MyPartitionedTable (j)
CREATE INDEX cl_ix ON MyNewPartition (j) include (PartCol)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Is successful as are…

CREATE INDEX cl_ix ON MyPartitionedTable (j, PartCol)
CREATE INDEX cl_ix ON MyNewPartition (j, PartCol)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3
CREATE INDEX cl_ix ON MyPartitionedTable (PartCol, j)
CREATE INDEX cl_ix
ON MyNewPartition (PartCol, j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Scenarios


Partitioning on multiple columns
Although the partitioning column must be a single column, it does not need to be numeric and it can be calculated so that the range can include multiple columns. For instance it is common to partition on datetime data by month. This will work well, because that data is usually in a single column, but what do you do if you have data for multiple companies and you also want to partition by company? For this you could use a computed column for the partitioning column. This will create a computed column using the 'company id' and 'order month' which is then used for the partitions. It will partition three companies for the first three months of 2007.
CREATE PARTITION FUNCTION MyPartitionRange (INT) 
AS RANGE LEFT FOR VALUES
(1200701,1200702,1200703,2200701,2200702,2200703,3200701,3200702,3200703)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange

ALL TO ([PRIMARY])
CREATE TABLE CompanyOrders
(
Company_id INT ,
OrderDate datetime ,
Item_id INT ,
Quantity INT ,
OrderValue decimal(19,5) ,
PartCol AS Company_id * 10000 + CONVERT(VARCHAR(
4),OrderDate,112) persisted
)
ON MyPartitionScheme (PartCol)

The computed column must be 'persisted' to form the partitioning column.

We will investigate the maintaining of partitioned data in this table later.

Monthly Data – the sliding range

A common requirement is to partition by month. This means that new month partitions need to be added and possibly old data partitions removed. I will describe the process for the addition of a new partition for later data, to remove an old partition the process is the same except that you swap out two partitions, merge the range and swap in a single table.

We create a partitioned table for data by OrderDate month

CREATE PARTITION FUNCTION MyPartitionRange (datetime) 
AS RANGE RIGHT FOR VALUES ('20070101', '20070201', '20070301', '20070401')
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE Orders
(

OrderDate datetime ,
Item_id INT ,
Quantity INT ,
OrderValue decimal(19,5)
)
ON MyPartitionScheme (OrderDate)

This will give four partitions…

OrderDate < '20070101' 
OrderDate >= '20070101' AND < '20070201'
OrderDate >= '20070201' AND < '20070301'
OrderDate >= '20070301' AND < '20070401'
OrderDate >= '20070401'

Therefore the data will be split into partitions by month.

And we insert some test data

--insert Orders select '19000101', 1, 1, 1
INSERT Orders SELECT '20070101', 1, 1, 1
INSERT Orders SELECT '20070201', 1, 1, 1
INSERT Orders SELECT '20070301', 1, 1, 1
INSERT Orders SELECT '20070401', 1, 1, 1
INSERT Orders SELECT '20070402', 1, 1, 1
INSERT Orders SELECT '20070501', 1, 1, 1


To add the next month's partition it is possible to just split the range and let the system take care of the data. This though would mean that the data would be off-line for the duration of the operation. It is better to use the experience we have gained in switching partitions to create the new data in separate tables then switch them in. This means that the table would be off-line for a very short time – just while the switch operations are taking place.

If data is being continually added to the partitioned table then a snapshot can be taken, the new tables prepared on this and the switch-in operation will need to take the table off-line, merge the new data with the prepared snapshot data then perform the switch-in. An identity column on the table would help to identify new data added since the snapshot. This would mean longer downtime than for static data but still a lot less than splitting a populated range.

To add a new months partition of static data


  1. Create the table containing the new months data.
  2. Create the table containing the data after the new month
  3. Swap out the last month from the partitioned table
  4. Split the (empty) range in the partitioned table
  5. Swap in the new months data
  6. Swap in the table containing the data after the new month
  7. Check the result


1. Create the table containing the new months data.

In the initial discussion we created a table for this but you might find it easier to create a partitioned table for the operation.

Note that if you script the existing table and indexes make sure that you remember that not all indexed columns may appear in the index script.

Also if you take this route you will have to use a new partition function and scheme as you will need to split the range. I prefer to use non-partitioned tables for flexibility.

To population of the table will depend on where the data resides but wil only affect the production table if you need to read the data from that table.

CREATE TABLE Orders_200704
(
OrderDate datetime CHECK (OrderDate >= '20070401'
AND OrderDate < '20070501' AND OrderDate IS NOT NULL) ,
Item_id INT
,
Quantity INT ,
OrderValue decimal(19,5)
)

INSERT Orders_200704 SELECT * FROM Orders WHERE OrderDate >= '20070401
AND OrderDate < '20070501'

2. Create the table containing the data after the new month

In the same way as the previous table populate with the data that is later than the new month.

CREATE TABLE 
Orders_200705
(
OrderDate datetime CHECK (OrderDate >= '20070501' AND OrderDate
IS NOT NULL) ,
Item_id INT ,
Quantity INT ,
OrderValue decimal(19,5)
)

INSERT Orders_200705 SELECT * FROM Orders WHERE OrderDate >= '20070501'

At this point take the production table off-line for the swap.

3. Swap out the last month from the partitioned table

For this you will need an empty table to swap the data into. It is tempting again to use a partitioned table for this – if so you will need to create a new partition function and scheme as you would not want to lose the data until after splitting the range on the production table.

 CREATE TABLE 
Orders_200704_Old
(
OrderDate datetime CHECK (OrderDate >= '20070401'
AND OrderDate IS NOT NULL) ,
Item_id INT ,
Quantity INT ,
OrderValue decimal(19,5)
)

ALTER TABLE Orders switch PARTITION 5 TO Orders_200704_Old

4 Split the (empty) range in the partitioned table

This is done by adding a new value to the partition function

ALTER PARTITION FUNCTION MyPartitionRange () split RANGE ('20070501')

As the partition is empty this should be quick as it just means creating a new empty partition.

5 Swap in the new months data

ALTER TABLE Orders_200704 switch TO Orders PARTITION 5

6 Swap in the table containing the data after the new month

ALTER TABLE Orders_200705 switch TO Orders PARTITION 6

7 Check the result

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('Orders')

Gives the expected result


partition_id object_id index_id partition_number hobt_id rows
-------------------- ----------- --------- ---------------- -------------------- ----
72057594038845440 213575799 0 1 72057594038845440 1
72057594038910976 213575799 0 2 72057594038910976 1
72057594038976512 213575799 0 3 72057594038976512 1
72057594039042048 213575799 0 4 72057594039042048 1
72057594039173120 213575799 0 5 72057594039173120 2
72057594039238656 213575799 0 6 72057594039238656 1


Now the table Orders_200704_Old can be dropped at your leisure.


Adding a new partition with a computed partition function

We return to the table partitioned by company and month.

CREATE PARTITION FUNCTION MyPartitionRange (INT) 
AS RANGE LEFT FOR VALUES
(1200701,1200702,1200703,2200701,2200702,2200703,3200701,3200702,3200703)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])

CREATE TABLE CompanyOrders
(
Company_id INT ,

OrderDate datetime ,
Item_id INT ,
Quantity INT ,
OrderValue decimal(19,5) ,
PartCol AS Company_id * 10000 + CONVERT(VARCHAR(4),OrderDate,112) persisted
)
ON MyPartitionScheme (PartCol)


In order too add a new month to this table, you will need to split each company's range for that month. To add a new company partition means splitting all months for that company.

There is no need to add all the partitions in one process, these operations can be performed one partition at a time. This should not affect the results of queries on the table.

The process of adding the new company or month is the same as for the sliding month data, only with many splits and swaps.

Other uses of partitioned tables

Usually, partitioned tables are used to horizontally, or vertically, partition the data. However, the partitions are sometimes used for different purposes – not partitioning the data at all.

I recently came across a reporting system that was querying a single flat table for aggregated results. The table was about 15Gb in size and could be filtered and grouped on any combination of columns. Indexing by date meant that a report for a year would take about 20 minutes – far too long (performance checked just before the release date of course), in fact anything more than 3 months was unacceptable. Normalising and using a view helped with the retrieval of the filter column data, but the actual report was limited by the amount of data it needed to aggregate.

Due to time constraints and policy, we were not allowed to create a cube and the report could not call a stored procedure. Oddly, there was a lot of flexibility in the query used to extract data but it had to access the single table.

The solution was to create a partitioned table. The first partition (partition value = 1) contained the old data table and would still be slow to access.

The second partition contained aggregated data, aggregated by filter columns that kept the table size less than 200K rows. All partitions have the same structure so those filter columns that were excluded were set to null. Accessing this partition was quick enough for any report.

The report application was then changed to check if the filter/grouping columns were all included in the second partition – if so it appended “and PartCol = 2” to the query otherwise it appended “and PartCol = 1”.

The reporting application could warn the users if they were about to do something that would take a long time.

We then checked how the reports were being used, and selected combinations of columns that could be used for other partitions. The partitions were added to the table (not affecting the system) and then, at a later time, the reporting application changed to use the new partitions.

This would have been easier with a stored procedure as the partitions could have been left as separate tables and the stored procedure could choose which to query, but a stored procedure call was not allowed by the reporting application.