Sunday, April 8, 2007

More with SQL Server 2005 : Top x Per Group, Paging, and Common Table Expressions

More with SQL Server 2005 : Top x Per Group, Paging, and Common Table Expressions

Returning Top X Rows Per Group

The following will return the top 10 players who hit the most home runs per year since 1990. The key is to calculate the "Home Run Rank" of each player for each year.

select
HRRanks.*
from
(
Select
b.yearID, b.PlayerID, sum(b.Hr) as TotalHR,

rank() over (partition by b.yearID order by sum(b.hr) desc) as HR_Rank
from
Batting b

where
b.yearID > 1990

group by
b.yearID, b.playerID

)
HRRanks

where
HRRanks.HR_Rank <= 10


Paging Results with SQL 2005 using ROW_NUMBER()

Paging data is so much easier in SQL 2005. All of my old techniques are no longer needed, which is great because they were hard to implement.

A new function call ROW_NUMBER() works much in the same way as RANK(), though you cannot use partitions. Of course, we must have a clear, unique ORDER BY established, otherwise the results will not be deterministic and you will not always get the same rows returned for each page.

In this example, we'll page the "Master Table" of players, sorted by firstname, lastname and using lahmanID (the primary key of the table) as the "tie-breaker". We'll set a couple of variables that could be turned into parameters in a stored proc to indicate the starting and ending rows to return:

declare @startrow int
declare @endrow int

set @startRow = 40
set @EndRow = 70

select
MasterRowNums.*
from
(
select
m.nameLast, m.nameFirst, m.lahmanID,
ROW_NUMBER() over (order by m.nameLast, m.nameFirst, m.lahmanID) as RowNum

from
[master] m

)
MasterRowNums
where
RowNum between @startRow and @endRow

order by
nameLast, NameFirst, lahmanID


Common Table Expressions - Easier Derived Tables

with MasterRowNums as
(
select m.nameLast, m.nameFirst, m.lahmanID, ROW_NUMBER() over (order by m.nameLast, m.nameFirst, m.lahmanID) as RowNum
from [master] m
)
select
MasterRowNums.*
from
MasterRowNums
where
RowNum between @startRow and @endRow
order by
nameLast, NameFirst, lahmanID

Here's the Top Rows Per Group solution given above, using a CTE:

with HRRanks as
(
Select b.yearID, b.PlayerID, sum(b.Hr) as TotalHR,
rank() over (partition by b.yearID order by sum(b.hr) desc) as HR_Rank
from Batting b
where b.yearID > 1990
group by b.yearID, b.playerID

)
select
HRRanks.*
from
HRRanks
where
HRRanks.HR_Rank <= 10