Saturday, November 17, 2007

Create and Modify SQL Server Indexes Properly

 http://www.devx.com/getHelpOn/Article/10016

Create and Modify SQL Server Indexes Properly

Learn how to determine the state of your indexes and understand whether that state should be changed. See how evaluating the construction and location of your system's indexes and deciding whether to adjust their fill factors and padding can improve the system's performance. 

Indexes are critical to smooth data retrieval and manipulation in SQL Server. They are the way foreign keys maintain relationships among themselves and the way DB pros enforce order on data. The system decides how and where an index is built by following the database practitioners directions to address the following questions:

  • Is the index the result of a constraint?
  • Is the index clustered or not?
  • If it is a non clustered index, should it be on a different file group?
  • If it is clustered, what fill factor and padding should be used?
  • Is the index unique or not?
  • Is the index in ascending or descending order?
  • Is the index built on a single column, or more than one column?

A look at the column and table constraint language from the create table statement reveals placeholders that implement these decisions:

< column_constraint > ::= [ CONSTRAINT constraint_name ] 
{ [ NULL | NOT NULL ]
| [ { PRIMARY KEY | UNIQUE }
[ CLUSTERED | NONCLUSTERED ]
[ WITH FILLFACTOR = fillfactor ]
[ON {filegroup | DEFAULT} ] ]

< table_constraint > ::= [ CONSTRAINT constraint_name ]
{ [ { PRIMARY KEY | UNIQUE }
[ CLUSTERED | NONCLUSTERED ]
{ ( column [ ASC | DESC ] [ ,...n ] ) }
[ WITH FILLFACTOR = fillfactor ]
[ ON { filegroup | DEFAULT } ]

The difference between the first and second forms of the command pertain to the number of columns in the index. The first form is for a single column, while the table level constraint allows you to include multiple columns.

In this 10-Minute Solution, you will learn how to determine the state of your indexes and understand if it should be changed. You also will learn how to vary the construction and location of your system's indexes and how these choices influence maintenance and performance. Properly applied, your choices can lead to quicker response times and more efficient space management for your data and its indexes.

How do I determine the state of my indexes? How do I know if they need my intervention? What methods can I apply to modify them, and how do I measure the effect of my changes?

Learn how to determine the state of your indexes and understand whether that state should be changed. Start by identifying them using sp_help @tablename and sp_helpindex, and then evaluate the construction and location of your system's indexes and decide if its necessary to adjust their fill factors and padding. These modifications will impact your system.

How Indexes Are Made
Among the resources that shape indexes, the most elemental ones at your disposal are the statements that create the index and the two system stored procedures, sp_helpindex and sp_help @tablename. As an example, I asked the Object Browser in Query Analyzer to create a script for a table as a create. I then invoked sp_helpindex to get the details of the indexes on this table, Company. Figure 1 shows the outputs of the create table statement and sp_helpindex.


Figure 1: The Outputs of the Create Table Statement and of sp_helpindex

From the output, you can see that the system answered the lions share of the index-creation questions from the Introduction:


  • The index named PK_Company was created as the result of a constraint creating a primary key.
  • PK_Company is a clustered index, meaning that the table is in the order specified by the index key, in this case, the identity column CompanyID.
  • The index created has only one column, and since it was a primary key constraint, it also is unique and not null.
  • Because I didnt specify the direction of the index, it was created in the default of ascending order.
  • Since the clustered index was created on the Primary file group, the table and its index were created there. Clustered indexes are always created on the same file group as the data, because the leaf level or lowest level of a clustered index is the data itself. (In fact, you can move a table to a different file group by creating a clustered index for the table on a different file group. Both structures will move. Additionally, clustered indexes are effective for range searches because of their proximity to the data and because they sort the table in index order.)

The Unintended Consequences of Clustered Indexes
Be sure to analyze the retrieval requests for your system and design your indexes to accommodate most of those requests. You'll have to accept tradeoffs between the most frequently executed queries and the less frequently executed but more demanding queries.

The optimizer can chose to use an existing index based on its statistics. However, if the statistics indicate that reading the whole table would be cheaper, the optimizer will perform a table scan. The database can maintain index statistics with the Auto_Create_Statistics option set, or you can update them manually. Depending on how busy your server is, you may elect to perform updates at less busy times. As a side benefit, you also can keep statistics on non-indexed columns, which could help query execution.

Clustered and non-clustered indexes are interrelated by virtue of the clustering index key. You can promote efficiency in index maintenance by specifying clustered indexes first, because the non-clustered indexes use the clustered indexs pointers. If you specify a non-clustered index before a clustered index and then drop the clustered one, you are forcing the server to do double the work. When you specify a clustered index, you create the clustering key that succeeding secondary or non-clustered indexes use. If the clustered index comes after the non-clustered index, the server will drop and recreate the non-clustered indexes so it can inform them about where to find pages with the clustering key. To avoid this extra work, use the Drop Existing clause of the Create Index statement.

Another consequence of clustered indexes is you can influence the physical location of the data file by using the ON FILEGROUP clause. If you designate a different file group for the index, you can move the table to the named file group in the clustered index order. Because the clustered index in effect is the data, one less intermediate leaf level lies between the data and the index. Youll have to invoke the Alter Database command to create the new file group, then relocate your table to the new file group with a Create Index statements On Filegroup option.

If you elect this Drop Existing option after the fact (i.e., your table already exists and you have already created the new file group), you can use the create index command with the DROP EXISTING clause to prevent the double build and rebuild of the non-clustered indexes.

Facts About the Fill Factor
I selected a small table to demonstrate the effects of changing the fill factor on an index and to show how these changes affect how full the pages used by the index are. Fill factor sets the amount of room left empty in an extent so that new entries can be moved into the extent while still preserving the order of the index. Bear in mind that if you are using an identity on a clustered index, then the default fill factor should be fine because intervening values wont be inserted into the index. Each new entry will go to the end of the page because it will be the next highest number in the sequence. If you create clustered keys on identity columns in a table that has heavy transactional activity, you could run into some limitations. By creating a clustered index on an identity, youll create a "hot spot" on the last page of the index. New entries follow one another because of the serial nature of the key, and they end up competing for the last open space on the page. In a table thats heavily transaction-oriented, this could create a bottleneck.

The whole point of the fill factor is to reserve space on an index page to avoid page splitting, a very inefficient operation. A page split is the internal mechanism the server uses to make room for a new element in a file. If an index page splits, the system consumes more disk I/O to read the index allocation map as it tries to locate the next segment on a disk. New pages are not created contiguously, so fill factors leave a little room to grow and hopefully minimize—or at least postpone—page splitting.

A table that wont experience any insertions in its data rows could have a fill factor of 100, meaning that the pages will be filled to capacity (100 percent) in the index order. Without reserving a little headroom on the index page, any change in the constitution of the file will result in a page split. On the other hand, a file that is subject to frequent additions and deletions could be left with a fill factor of 10. Ninety percent of the space on a page would be left vacant, allowing new entries to be added with less rearrangement of the index.

Evaluating an Index
The DBCC DBREINDEX command enables you to change the fill factor of an existing index, and the DBCC SHOWCONTIG command provides the information you need to evaluate the operation by telling you how efficiently the index is stored. If your system is too busy to tolerate the downtime of DBREINDEX, DBCC DBDEFRAG is an online operation that can proceed while users are in the database.

Start the process of evaluating an index by looking at the output DBCC SHOWCONTIG produces. This process can report all tables and indexes in the database or just the ones you specify. You can then take cues from that data. Because each tables profile of use is different, the fill factor used for each one should be evaluated in terms of the frequency of its update and deletion activity. Some tables, like look-up tables, will be static and wont need extra room to avoid page splitting. Casting these tables with a high fill factor makes sense.

On the other hand, tables that are subject to lots of processing (i.e., insertions and deletions) would benefit from a lower fill factor, which reserves room on their pages. This way, DBREINDEX or DBDEFRAG can shuttle old values about to keep the index internals in order. Figure 2 shows the results of the first pass of SHOWCONTIG with the fill factor set to 90. It is a small table, so the results are not dramatic, but they are informative.


Figure 2: The Results of SHOWCONTIG with the Fill Factor Set to 90

The data demonstrate that the index consumes two pages on two extents (small tables are first stored on mixed extents, subsequent additions are then stored on their own extent) and that the table occupies a little more than three quarters of each page on which it exists. I can influence the layout of the index by using the DBCC DBREINDEX command with a different fill factor (see Figure 3), hypothesizing that this will be an active table that needs room to accommodate new entries.


Figure 3: The DBCC DBREINDEX Command

A subsequent running of DBCC SHOWCONTIG shows the effect of the new fill factor, 50 versus 90. The new fill factor leaves more room on each page but its side effect is that the index now takes up more pages (see Figure 4), a logical consequence of leaving more room per page.


Figure 4: The Results of SHOWCONTIG with the Fill Factor Set to 50

Make Wise Index Decisions
With SHOWPLAN, SHOWCONTIG, and the other DBCC commands, SQL Server provides great tools for evaluating and managing indexes. Next to database design, no other factor is as potent for getting the best performance out of your system. Databases use indexes to speed processing, both data modification and retrieval. Although they are not part of the relational model, indexes are a necessary physical manifestation of primary and foreign keys as well as commonly queried columns.

The databases system tables hold statistics that the optimizer uses to decide how to most quickly resolve your queries. These statistics convey how skewed the distribution of values in the index are by providing the number of values associated with each key and its distribution, both of which indicate how selective or useful the index is. Based on the statistics, the system will use the index to satisfy a query or, if the index is not selective enough, will disregard the statistics and scan the whole table—something you want to avoid.

I hope you use this information to make informed decisions about what to include in your indexes, how to maintain them, and how to get the best performance from your system.