Wednesday, January 9, 2008

Creating Cross Tab Queries in SQL Server

O'Reilly Network: Creating Cross Tab Queries in SQL Server
http://www.oreillynet.com/pub/a/network/2004/12/17/crosstab.html

Creating Cross Tab Queries in SQL Server

by Michael Schmalz, 12/17/2004

Before I started using Microsoft Access heavily, I did analysis work with PivotTables in Excel. Moving over to Microsoft Access, I found that using cross tab queries allowed me to do similar work but not to the same degree of detail. When I made the next leap to SQL Server, I was surprised to find that there was not an easy solution inside SQL Server to perform a cross tab query. When you are working with a linked SQL Server table in Access, you can simply perform the cross tab from Access. But there are times when you may want to perform the cross tab query on the server, or you need to perform it and you are not using Access. I could not find any adequate solutions to automatically produce a cross tab, so I decided to see if I could create it myself.

Limited Information Available

I found you could use the Case … When statement inside a query to simulate a cross tab query. An example of how to do this with Case … When can be found in the SQL Server Books Online that pivots quarterly sales data. If you wanted to do something similar with your data, it's possible, but you would have to write it from scratch and include all the required columns. This is easy if you are doing quarterly reporting, but it becomes trickier when you are dealing with products, managers, regions, and so on. If you wrote a report manually, you would then have to know or run some type of check to ensure you had captured every column; not doing this would create consistency issues. This manual solution is sufficient for some tasks, but it does not address the need for a full solution. For example, if you are using regions as the columns in your query, they would be relatively static, and you could probably maintain that type of query rather easily. However, if you were using products, you would be constantly maintaining the stored procedure.

The first stab I took at producing a stored procedure to solve the problem was successful for a single table. What I did was create a cursor that ran a query selecting the field I wanted with a Group By query to get the column names I wanted. I then was able to cycle through that cursor and use the results to write the SQL I needed. That seemed fine, and I was happy with it at first. But later I wanted to make changes, and I soon discovered I could not easily change the field I wanted to use for the results; I had to create multiple procedures for each table I wanted to run a cross tab on. While this is not terrible, I wanted to try to create a solution that would work in many situations. The only way to do that would be to create a cursor with a dynamically created query. This did not turn out to be as intuitive as I had expected. I will now walk you through the thought process and solution. Refer to Listing 1 to see the full text of the solution.

Creating a Cursor from a Dynamic SQL String

The fact that I needed to create the cursor with a SQL string that is created dynamically to implement the generic solution caused a problem because I could not find a way to do this from within a stored procedure. The solution turned out to be easy in the end, but it took me quite a while to figure it out. There may be additional ways to do this, but my solution was to create a temporary table and base the cursor on that temporary table. I was able to write the make table query dynamically and run it to create the table. The cursor declaration then simply reads:

declare xcursor Cursor  for  Select * from ##temptbl_Cursor

Solving the problem this way enabled me to cycle through the cursor using the same SQL statement regardless of the table it originated from and regardless of the name of the field. This solved the first problem of having a stored procedure that could be used in multiple databases on multiple tables. It is important to note that using this solution increases the complexity of calling the stored procedure while simultaneously increasing the number of opportunities to use the stored procedure.

Cycling Through the Cursor

While this is probably something most readers will know, I wanted to explain how to cycle through the cursor and also describe the SQL that is being written.

When you are using a cursor, you use a command called FETCH to get the data out of the cursor. The FETCH command will go to the cursor and get (fetch) the Next, Prior, First, or Last row. When you use Next on a cursor that has just been opened, it will fetch the first row. This lets you use Next throughout instead of using First for the initial FETCH and Next for the others, although you could do this as well.

Another concept you should be familiar with is using @@FETCH_STATUS to determine whether the fetch was successful. This is tricky in some situations because the fetch status is a global status for all the cursors running on the connection, and it represents the last fetch statement run. While the stored procedure that is being described here is only using one cursor, if you were to use multiple cursors in another stored procedure this is important to know. For example, if you have a cursor called xcursor in StoredProcedure1 and another called ycursor in StoredProcedure2 and StoredProcedure1 calls StoredProcedure2, then you will need to find the status of xcursor before calling StoredProcedure2. If you fail to do this, then when you try to find out the value of @@FETCH_STATUS, you will get the status of the last fetch even if it was in the other procedure. Again, this is not relevant in this example, but it is certainly something to keep in mind when you are writing other stored procedures and using cursors. This procedure then uses While @@FETCH_STATUS = 0 … End to cycle through the procedure. A @@FETCH_STATUS of 0 means that the fetch was successful. Some people use While @@FETCH_STATUS <> -1, which means that the FETCH statement failed. Either method will work.

The most important piece of the SQL statement that is being produced by cycling through the cursor is the Case … When for each column. Here is how that line reads:


set @tempsql = isnull(@sqlstr,'') + isnull(@XFunction +
'( Case When ' + @XField + " = '" +@Col + "' then [" + @XFunctionField + "]
Else 0 End) As [" + @XFunction + @Col + "]" ,'')

The variables @XFunction, @XField, and @FunctionField are arguments that are populated when you call the stored procedure. The @XFunction is the function you want to use to summarize the data; an example would be using Sum. The @XFunctionField is the name of the field you want to perform the function on. The @XField represents the field you want to use as the column names in your query. For example, if you want to add the sales of various regions, the @XFunction would be Sum, the @XFunctionField would be Sales, and the @XField would be Region. You need to use the ISNULL function because when you add a null string to another null string the result is null. By using ISNULL you replace a null string with an empty string. This enables you to avoid having to perform one string function on the first record (the @sqlstr string is null for the first record) and a different one on the rest. You certainly could do that, but this is a simpler method.

The Case … When statement brings in the value of @XFunctionField when the value of @XField matches the current value of the cursor's FETCH statement (being stored in @Col). When the value of @XField does not match the value in @Col, it will put in a zero. This works for Sum and Max functions. If you want to have a cross tab with Averages, I would suggest calculating your average with a normal view, having one row for each row in the cross tab, and then using Sum for the cross tab. If you use Min, unless there are negatives in your original view, you will get zero for all of them. So, if you are going to use other functions, make sure you verify that the results are what you want. If you want a count, you need to use Sum as the function but change the Case … When statement by replacing @XFunctionField with the number 1. This will give you a 1 when it matches and a 0 when it doesn't. If you are going to implement this, I would suggest calling it SimpleXTabCount, taking away the @XFunction from the arguments, replacing it with a variable called @XFunction, and just setting it to Sum.

Building the Final SQL Statement

The next section builds the final SQL statement. It reads:

set @tempsql = 'Select '  + @XRow + ', ' + @Sqlstr + ' From ' + @XTable +
@XWhereString + ' Group by ' + @XRow

This line uses three variables we did not cover yet; they are @XRow, @XTable, and @XWhereString. The @XRow variable is passed to the stored procedure as the field(s) you want to use for the rows in your result query. For example, if you wanted to see product sales by region, you would use Product as @XRow. You can also pass multiple field names separated by commas. The @XTable is the name of the table or view you are using to pull the data from. And @XWhereString is the Where clause you want to use. This procedure was designed to take the entire Where clause, including WHERE. This was done to prevent you from needing an additional line in case no criteria were sent. I could have done some checking earlier in the procedure and prepared the Where clause, but since the call to the stored procedure would likely be written by code or by an experienced user, I didn't feel this was necessary.

The next lines will close and deallocate the cursor and will then drop the temporary table that was used to create the cursor. The final line opens up the SQL statement that was created and returns the record set. These lines are:

Close xcursor
Deallocate xcursor

set @tempsql = N'Drop Table ##temptbl_Cursor'
exec sp_executesql @tempsql

exec sp_executesql @Sqlstr

There are several things to note that may limit your ability to use this in a production environment. The first is that this procedure limits the SQL text to 4,000 characters. This is due to the procedure sp_executesql requiring the string to be of type nvarchar, as nvarchar has a limit of 4,000 characters, and it uses 2 bytes per character, whereas varchar uses 1 byte per character and allows 8,000 characters.

The N means that the data type is Unicode. (A lot can be said about Unicode, but this is beyond the scope of the article.) The limitation of 4,000 characters can be overcome by using a make table query for the SQL text and concatenating multiple SQL strings together. To do this, you would need to continually check the length of the SQL string as you were adding to it and determine when to move to the next string. When the strings are finished, you would change the line that finishes the statement by adding into ##temptbl_XTabTalbe before the FROM clause in the final string. You would then use the Execute command to make the table. Your final line will simply call the statement Select * from ##temptbl_XTabTable with the sp_executesql command.

There was only one time when this was necessary for me, and it was due to a long table name and an incredibly large number of records in the cursor. I came up with the solution described above and it worked, but we ended up not implementing it. If you do run into a problem with the size of the SQL string being too large, you can try that solution. You'll notice when reviewing the code that I have included comments in the listing for the stored procedure. By removing these comments (*/ /*) and adding the comments to the sp_executesql line, you can check the size of the SQL string. This will help you either determine or rule out the size of the string if you have to debug the stored procedure.

Testing the Stored Procedure

I have used this procedure in many settings, but in order for you to test it, I have created a view to be used in the Northwind Database that comes with SQL Server. You can see this in Listing 2. If you add Listing 1 and Listing 2 to your Northwind Database in SQL Server, you are ready to test this procedure. I usually check my procedures by using a Pass-Through query from Access (make sure you set Returns Records to True), but you can also use Query Analyzer in Enterprise Manager. The following line is what I used to test this.


Execute SimpleXTab 'CategoryName', 'vw_SampleQuery', '', 'Sum',
'OrderAmt', 'ShipCountry'

The same line can be used in either Query Analyzer or Access. If you look closely you will see that I did not send a Where clause. This is because I wanted to see all records. If you run this with Access you should get the result shown in Figure 1. If you use Query Analyzer, you will get results like Figure 2.

Thumbnail; click for full-size image.
Figure 1. Access example; click for full-size image

Thumbnail; click for full-size image.
Figure 2. Query Analyzer example; click for full-size image

Wrap-Up

You should be able to use this stored procedure in databases where you need the functionality of a cross tab query. After you feel comfortable with the syntax, you can find many ways to enhance it for your own purposes. I have also written similar stored procedures that will produce two or more columns per item in a cursor; this allows you to see the number of items produced and the total cost for each item. Or, in a financial example, you can pull loan balances and the interest to calculate a yield. These are things that a simple cross tab query can't even tackle.

The two main items I hope readers take from this article is the workaround to create a cursor with a dynamically created SQL statement and how to use Case … When. These concepts will be useful to you in many other settings.


1.)

SET QUOTED_IDENTIFIER OFF 
GO
SET ANSI_NULLS OFF
GO

CREATE PROCEDURE [dbo].[SimpleXTab] @XField varChar(20), @XTable varChar(20),
@XWhereString varChar(250), @XFunction varChar(10), @XFunctionField varChar(20), @XRow varchar(40)
AS
Declare @SqlStr nvarchar(4000)
Declare @tempsql nvarchar(4000)
Declare @SqlStrCur nvarchar(4000)
Declare @col nvarchar(100)

set @SqlStrCur = N'Select [' + @XField + '] into ##temptbl_Cursor from [' + @XTable + '] ' + @XWhereString + ' Group By [' + @XField + ']'

/* select @sqlstrcur */
exec sp_executesql @sqlstrcur



declare xcursor Cursor for Select * from ##temptbl_Cursor

open xcursor


Fetch next from xcursor
into @Col


While @@Fetch_Status = 0
Begin
set @Sqlstr = @Sqlstr + ", "
set @tempsql = isnull(@sqlstr,'') + isnull(@XFunction + '( Case When ' + @XField + " = '" +@Col +
"' then [" + @XFunctionField + "] Else 0 End) As [" + @XFunction + @Col + "]" ,'')
set @Sqlstr = @tempsql
Fetch next from xcursor into @Col

End


/* Select @Sqlstr as [mk], len(@sqlstr) as [leng] */

set @tempsql = 'Select ' + @XRow + ', ' + @Sqlstr + ' From ' + @XTable +
@XWhereString + ' Group by ' + @XRow
set @Sqlstr = @tempsql

Close xcursor
Deallocate xcursor

set @tempsql = N'Drop Table ##temptbl_Cursor'
exec sp_executesql @tempsql

/* Select @Sqlstr as [mk], len(@sqlstr) as [leng] */

exec sp_executesql @Sqlstr
GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO
2.) 
CREATE VIEW dbo.vw_SampleQuery
AS
SELECT dbo.Orders.ShipName, dbo.Categories.CategoryName, dbo.Orders.ShipCountry, (dbo.[Order Details].UnitPrice * dbo.[Order Details].Quantity)
* (1 - dbo.[Order Details].Discount) AS OrderAmt
FROM dbo.Orders INNER JOIN
dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID INNER JOIN
dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID INNER JOIN
dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID

GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO