Thursday, January 10, 2008

Measure TSQL Statement Performance

http://www.sql-server-performance.com/articles/per/tsql_statement_performance_p1.aspx

Measure TSQL Statement Performance

By : Greg Larsen

How do you determine how well your application is performing?  What tools can you use?  Every developer needs to make sure that each application TSQL statement is optimized.  By tuning each query you can make sure that your application is running as efficient as possible.  It is much easier to make modifications to application code while you are working in a development environment.  Once your code is in production making modifications to optimize your code might take a long time, or even be impossible.  This is why you need to review your application code performance periodically during the application development process.  This article will give you a few different ideas on how to identify slow running queries and provide you with some tips on monitor your query performance while you make iterative changes to each query to try and improve performance.

How to Identify Slow Running Queries

 With SQL Server 2005 you have a couple of different options.  The first option, which is also available in other versions of SQL Server is SQL Server Profiler.  Additionally with SQL Server 2005 you have the “sys.dm_exec_query_stats” DMV.   Let me review how both of these options can be used to identify your slow running queries.

SQL Server Profiler is a GUI tool that is easy to use.  Profiler can be used to display CPU, I/O and duration of SQL Server TSQL statements being processed on a SQL Server instance.  In order for Profiler to be used to find your slow running queries you need to create and start a profiler trace. This can easily be done using the “New Trace” item.  Profiler has a “TSQL Duration” template that can be used to find your long running queries.  But if you want to display CPU and I/O with those long running queries you will need to create your own Profiler trace, or modify the columns selected with the “TSQL Duration” template.  You can modify the “TSQL Duration” template by click on the “Event Selection” tab, then to select CPU, and I/O by just click on the check box under the “CPU”, “Reads”, or “Write” columns.   For more information about how to use SQL Server Profiler please refer to Books Online. 

If you have SQL Server 2005 you can use the “sys.dm_exec_query_stats” DMV to also identify the long running queries.  This DVM can be used to return performance statistics for cached query plans.   Below is an example of how to return a number of different performance measurements for each cached plan.

SELECT  creation_time 

,last_execution_time

,total_physical_reads

,total_logical_reads 

,total_logical_writes

, execution_count

, total_worker_time

, total_elapsed_time

, total_elapsed_time / execution_count avg_elapsed_time

,SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,

((CASE statement_end_offset

WHEN -1 THEN DATALENGTH(st.text)

ELSE qs.statement_end_offset END

- qs.statement_start_offset)/2) + 1) AS statement_text

FROM sys.dm_exec_query_stats AS qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st

ORDER BY total_elapsed_time / execution_count DESC;

With this query I have calculated the average elapsed time then ordered the output so the query with the worse average elapsed time is displayed first.  Elapsed time is a good measurement to use for identifying poor performing queries because this is the actual clock time it takes to process a query.  

By using profiler or the DMV query you can determine your worst performing queries based on duration/elapsed time.  Once you have identified the worse performing queries you can then focus your work on optimizing these queries.  By optimizing the worse performing queries you will get the biggest bang for your effort and improve the overall performance of your application with the least amount of effort.

Tools to help optimize query

When optimizing a query you need to go through a repetitive process of reviewing performance information after each change.  Doing this review allows you to determine if your enhancement has improved, or made your query’s performance worse.   There are three measurements that are useful in determining the performance of your query: Elapsed Time, I/O and CPU.  Let’s review different methods to measure each of this for a given query. 

Elapsed Time

The length of time a query takes to run is known as elapsed time, also known as clock time.  Many things can affect the elapsed time of a query.  Elapsed time is a excellent benchmark measurement to gauge how will a query performs because it measures how long a user would have to wait while SQL server processes  a specify query.  So what are the different ways you can measure the elapsed time of a TSQL statement?

The first method you can use is the “elapsed time” that is displayed in the query window within SQL Server Management Studio when a batch is executed.  This would be the time that I’ve circled in the screen shot below:

Here I have executed a loop that adds 1 to a variable 100 million times.  Note the elapsed time in the lower left hand corner that I have circled in red.  If you are running your slow queries in SSMS you can use this elapsed time display as a measurement to identify how long your query ran.  Each time you change your query and re-run it you can look at this number to see if your query ran faster. 

Now another method the measure elapsed time is to use a SET STATISTICS command to display the elapsed time for a query.  Below is a screen shot that shows the elapsed time statistics in the query results pane for each statement executed in my query window:

Here I executed the “SET STATISTICS TIME ON” statement prior to running my simple SELECT statement.  Note how the “elapsed time” statistics are displayed in the result pane, along with the “CPU time”, and output from my PRINT statement.   The “SET STASTISTICS TIME ON” statement displays the amount of time it takes to parse, compile and execute each statement.  This statement is useful if you are only executing a few statements.  If you are executing a large number of statements like my first looping example then the number of lines of output would be very large making it hard to read. 

Now sometimes you need to measure how long it takes to execute different chunks of code in a batch, not just a specific TSQL statement.  This would be useful if you had a large script or stored procedure that takes a long time, and you want to figure out which part takes the longest.  To create elapsed time for sections of code we can use the following method:

Here you can see that I ran two different simple loops.  For each loop I calculated the time it takes to process each loop.   I do this by capturing the begin time of each chunk of code into a variable named @START.  Then when the loop finishes I print out the time difference between the @START DATETIME variable and the current value returned from the GETDATE() function.  In my case it took 16 ms to process a through my first loop 10,000 times, where as it took 110 second to process through the second loop 90,000 times. 

CPU Time

Another important performance measurement is the amount of CPU your query needs to return a result set.   In the prior section, CPU time was displayed for each statement in a batch when I use the “SET STATISTICS TIME ON” statement.  So this is the first method you can use to measure CPU consumption of a TSQL statement.  But once again, how do you measure CPU for multiple statements, or different chunks of code.

SQL Server provides the @@CPU_BUSY system variable that can be used to measure CPU consumption.  Here is the definition that Books Online has for this variable:

Returns the time that SQL Server has spent working since it was last started. Result is in CPU time increments, or "ticks," and is cumulative for all CPUs, so it may exceed the actual elapsed time. Multiply by @@TIMETICKS to convert to microseconds.

Below is a method where you can use @@CPU_BUSY to measure the amount of CPU used by SQL Server during for each execution of my loops from my prior example.  Now keep in mind since @@CPU_BUSY contains the amount of CPU used since SQL Server started, it contains all the CPU used by all user and system queries run against server.  Therefore this method is only accurate in measuring CPU if you are the only one running SQL Server commands on the system, and there are no back ground system queries running :

As you can see I measured CPU milliseconds used for both the first and second loop.  Just to let you know I ran this query a number of different times and got a number of different timings for each loop execution.  So keep in mind that using this method is impacted by other things running in SQL Server.  I would say that this method is not all that reliable for measuring actual CPU usage, but yet it is still a way to gauge CPU consumption if you have a standalone system with very little activity.   

A more accurate way to measure CPU usage is by using SQL Server profiler.   This is done by including the CPU column on the different completed events.  Here is an example of how I used SQL Server profiler to capture the amount of CPU used for the above looping  batch:

Here I used the SQL:BatchCompleted event to capture the CPU.   By default SQL Profiler displays the duration in milliseconds.  By using the “Tools -  Options” dialog box you can change this display into microseconds.  Note this method only measured the complete batch and not the duration of each loop within my batch.

I/O Usage

Like the other performance measures there are a number of ways to display the amount of I/O used by a TSQL statement.  There are 2 different kinds of I/Os that are tracked within SQL Server: Logical and Physical IOs.  Logical I/O account for data that is processed from the buffer pool which resides in memory, hence the phrase Logical I/O.  Physical I/Os are I/Os that are associated with accessing data directly from the physical disks that SQL Server uses to store databases.   Physical I/O’s are more expensive I/O’s, meaning they take longer to process.  I/O is general the single most expensive operation that impacts the overall performance of a TSQL statement.  So when you are tuning your queries you want to minimize the number of logical and physical I/O’s operation performed to produce a result set.

One method to show the amount of I/O associated with a TSQL statement or batch of statements is to turn on the I/O statistics gathering process by using the “SET STATISTICS IO ON” statement.  When you have issued this statement during your connection SQL Server will output the number of I/O used to resolve your TSQL statement.  Below is an example where I used the “SET STASTISTICS IO ON” statement to display the amount of I/O needed to resolve a simple query against the AdventureWorks database:

Here you can see my SELECT statement performed 2 “logical reads”, and 2 “physical reads”.  Now when I run this batch of statements a second time I get this output:

Here you will note that the second time I issue my TSQL SELECT statement it only required 2 “logical reads and 0 “physical reads”, this because the AdventureWorks pages holding the data for this query are already in the buffer pool.   If you are doing repetitive testing trying to improve the performance of your query you need to make sure you eliminate this I/O counting discrepancy that can arise when pages required by your query are already in the buffer cache.  To eliminate this counting issue, you can issue the “DBCC DROPCLEANBUFFER” command prior to running each test.  This will allow your queries to run with a clean buffer pool without stopping and restarting SQL server.   

Another method to track I/O’s for queries is to use SQL Server Profiler.  To do this just make sure you include the I/O related columns when identify the events and columns you want your trace to monitor.

Conclusion

Being able to track the resource consumptions of your different queries is a critical piece to monitor the performance of your application.   Knowing what TSQL statements are using the most resources helps you determine where to focus your attention when trying to optimize your application.  The “sys.dm_exec_query_stats” DMV helps DBA’s quickly identify those TSQL statements using the most resources.   Using various “SET” statements, system_variables and/or SQL Server Profiler events/columns can help measure CPU, I/O and the elapsed time for your problem TSQL statements. Identifying, measuring and improving performance of your application queries will help you optimize the code behind your application.


© 2000 - 2008 vDerivatives Limited All Rights Reserved.

How do you determine how well your application is performing?  What tools can you use?  Every developer needs to make sure that each application TSQL statement is optimized.  By tuning each query you can make sure that your application is running as efficient as possible.  It is much easier to make modifications to application code while you are working in a development environment.  Once your code is in production making modifications to optimize your code might take a long time, or even be impossible.  This is why you need to review your application code performance periodically during the application development process.  This article will give you a few different ideas on how to identify slow running queries and provide you with some tips on monitor your query performance while you make iterative changes to each query to try and improve performance.

How to Identify Slow Running Queries

 With SQL Server 2005 you have a couple of different options.  The first option, which is also available in other versions of SQL Server is SQL Server Profiler.  Additionally with SQL Server 2005 you have the “sys.dm_exec_query_stats” DMV.   Let me review how both of these options can be used to identify your slow running queries.

SQL Server Profiler is a GUI tool that is easy to use.  Profiler can be used to display CPU, I/O and duration of SQL Server TSQL statements being processed on a SQL Server instance.  In order for Profiler to be used to find your slow running queries you need to create and start a profiler trace. This can easily be done using the “New Trace” item.  Profiler has a “TSQL Duration” template that can be used to find your long running queries.  But if you want to display CPU and I/O with those long running queries you will need to create your own Profiler trace, or modify the columns selected with the “TSQL Duration” template.  You can modify the “TSQL Duration” template by click on the “Event Selection” tab, then to select CPU, and I/O by just click on the check box under the “CPU”, “Reads”, or “Write” columns.   For more information about how to use SQL Server Profiler please refer to Books Online. 

If you have SQL Server 2005 you can use the “sys.dm_exec_query_stats” DMV to also identify the long running queries.  This DVM can be used to return performance statistics for cached query plans.   Below is an example of how to return a number of different performance measurements for each cached plan.

SELECT  creation_time 

,last_execution_time

,total_physical_reads

,total_logical_reads 

,total_logical_writes

, execution_count

, total_worker_time

, total_elapsed_time

, total_elapsed_time / execution_count avg_elapsed_time

,SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,

((CASE statement_end_offset

WHEN -1 THEN DATALENGTH(st.text)

ELSE qs.statement_end_offset END

- qs.statement_start_offset)/2) + 1) AS statement_text

FROM sys.dm_exec_query_stats AS qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st

ORDER BY total_elapsed_time / execution_count DESC;

With this query I have calculated the average elapsed time then ordered the output so the query with the worse average elapsed time is displayed first.  Elapsed time is a good measurement to use for identifying poor performing queries because this is the actual clock time it takes to process a query.  

By using profiler or the DMV query you can determine your worst performing queries based on duration/elapsed time.  Once you have identified the worse performing queries you can then focus your work on optimizing these queries.  By optimizing the worse performing queries you will get the biggest bang for your effort and improve the overall performance of your application with the least amount of effort.

Tools to help optimize query

When optimizing a query you need to go through a repetitive process of reviewing performance information after each change.  Doing this review allows you to determine if your enhancement has improved, or made your query’s performance worse.   There are three measurements that are useful in determining the performance of your query: Elapsed Time, I/O and CPU.  Let’s review different methods to measure each of this for a given query. 

Elapsed Time

The length of time a query takes to run is known as elapsed time, also known as clock time.  Many things can affect the elapsed time of a query.  Elapsed time is a excellent benchmark measurement to gauge how will a query performs because it measures how long a user would have to wait while SQL server processes  a specify query.  So what are the different ways you can measure the elapsed time of a TSQL statement?

The first method you can use is the “elapsed time” that is displayed in the query window within SQL Server Management Studio when a batch is executed.  This would be the time that I’ve circled in the screen shot below:

Here I have executed a loop that adds 1 to a variable 100 million times.  Note the elapsed time in the lower left hand corner that I have circled in red.  If you are running your slow queries in SSMS you can use this elapsed time display as a measurement to identify how long your query ran.  Each time you change your query and re-run it you can look at this number to see if your query ran faster. 

Now another method the measure elapsed time is to use a SET STATISTICS command to display the elapsed time for a query.  Below is a screen shot that shows the elapsed time statistics in the query results pane for each statement executed in my query window:

Here I executed the “SET STATISTICS TIME ON” statement prior to running my simple SELECT statement.  Note how the “elapsed time” statistics are displayed in the result pane, along with the “CPU time”, and output from my PRINT statement.   The “SET STASTISTICS TIME ON” statement displays the amount of time it takes to parse, compile and execute each statement.  This statement is useful if you are only executing a few statements.  If you are executing a large number of statements like my first looping example then the number of lines of output would be very large making it hard to read. 

Now sometimes you need to measure how long it takes to execute different chunks of code in a batch, not just a specific TSQL statement.  This would be useful if you had a large script or stored procedure that takes a long time, and you want to figure out which part takes the longest.  To create elapsed time for sections of code we can use the following method:

Here you can see that I ran two different simple loops.  For each loop I calculated the time it takes to process each loop.   I do this by capturing the begin time of each chunk of code into a variable named @START.  Then when the loop finishes I print out the time difference between the @START DATETIME variable and the current value returned from the GETDATE() function.  In my case it took 16 ms to process a through my first loop 10,000 times, where as it took 110 second to process through the second loop 90,000 times. 

CPU Time

Another important performance measurement is the amount of CPU your query needs to return a result set.   In the prior section, CPU time was displayed for each statement in a batch when I use the “SET STATISTICS TIME ON” statement.  So this is the first method you can use to measure CPU consumption of a TSQL statement.  But once again, how do you measure CPU for multiple statements, or different chunks of code.

SQL Server provides the @@CPU_BUSY system variable that can be used to measure CPU consumption.  Here is the definition that Books Online has for this variable:

Returns the time that SQL Server has spent working since it was last started. Result is in CPU time increments, or "ticks," and is cumulative for all CPUs, so it may exceed the actual elapsed time. Multiply by @@TIMETICKS to convert to microseconds.

Below is a method where you can use @@CPU_BUSY to measure the amount of CPU used by SQL Server during for each execution of my loops from my prior example.  Now keep in mind since @@CPU_BUSY contains the amount of CPU used since SQL Server started, it contains all the CPU used by all user and system queries run against server.  Therefore this method is only accurate in measuring CPU if you are the only one running SQL Server commands on the system, and there are no back ground system queries running :

As you can see I measured CPU milliseconds used for both the first and second loop.  Just to let you know I ran this query a number of different times and got a number of different timings for each loop execution.  So keep in mind that using this method is impacted by other things running in SQL Server.  I would say that this method is not all that reliable for measuring actual CPU usage, but yet it is still a way to gauge CPU consumption if you have a standalone system with very little activity.   

A more accurate way to measure CPU usage is by using SQL Server profiler.   This is done by including the CPU column on the different completed events.  Here is an example of how I used SQL Server profiler to capture the amount of CPU used for the above looping  batch:

Here I used the SQL:BatchCompleted event to capture the CPU.   By default SQL Profiler displays the duration in milliseconds.  By using the “Tools -  Options” dialog box you can change this display into microseconds.  Note this method only measured the complete batch and not the duration of each loop within my batch.

I/O Usage

Like the other performance measures there are a number of ways to display the amount of I/O used by a TSQL statement.  There are 2 different kinds of I/Os that are tracked within SQL Server: Logical and Physical IOs.  Logical I/O account for data that is processed from the buffer pool which resides in memory, hence the phrase Logical I/O.  Physical I/Os are I/Os that are associated with accessing data directly from the physical disks that SQL Server uses to store databases.   Physical I/O’s are more expensive I/O’s, meaning they take longer to process.  I/O is general the single most expensive operation that impacts the overall performance of a TSQL statement.  So when you are tuning your queries you want to minimize the number of logical and physical I/O’s operation performed to produce a result set.

One method to show the amount of I/O associated with a TSQL statement or batch of statements is to turn on the I/O statistics gathering process by using the “SET STATISTICS IO ON” statement.  When you have issued this statement during your connection SQL Server will output the number of I/O used to resolve your TSQL statement.  Below is an example where I used the “SET STASTISTICS IO ON” statement to display the amount of I/O needed to resolve a simple query against the AdventureWorks database:

Here you can see my SELECT statement performed 2 “logical reads”, and 2 “physical reads”.  Now when I run this batch of statements a second time I get this output:

Here you will note that the second time I issue my TSQL SELECT statement it only required 2 “logical reads and 0 “physical reads”, this because the AdventureWorks pages holding the data for this query are already in the buffer pool.   If you are doing repetitive testing trying to improve the performance of your query you need to make sure you eliminate this I/O counting discrepancy that can arise when pages required by your query are already in the buffer cache.  To eliminate this counting issue, you can issue the “DBCC DROPCLEANBUFFER” command prior to running each test.  This will allow your queries to run with a clean buffer pool without stopping and restarting SQL server.   

Another method to track I/O’s for queries is to use SQL Server Profiler.  To do this just make sure you include the I/O related columns when identify the events and columns you want your trace to monitor.

Conclusion

Being able to track the resource consumptions of your different queries is a critical piece to monitor the performance of your application.   Knowing what TSQL statements are using the most resources helps you determine where to focus your attention when trying to optimize your application.  The “sys.dm_exec_query_stats” DMV helps DBA’s quickly identify those TSQL statements using the most resources.   Using various “SET” statements, system_variables and/or SQL Server Profiler events/columns can help measure CPU, I/O and the elapsed time for your problem TSQL statements. Identifying, measuring and improving performance of your application queries will help you optimize the code behind your application.