Jerry Dixon

Subscribe to Jerry Dixon: eMailAlertsEmail Alerts
Get Jerry Dixon: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Article

A Point of Order: Organize Your Data with SQL Ranking Functions

Sooner or later, you'll find a process that requires you to generate a series of sequential numbers

Sequential numbers. Sooner or later, you'll find a process that requires you to generate a series of sequential numbers. As a matter of fact, most database engines provide at least one method of producing such numbers. Microsoft Access, for example, has AutoNumber columns, while SQL Server has IDENTITY columns. However, these are features of tables. There are times when you need some sequential numbers, but creating a table is inconvenient or inefficient. So how does one proceed? How does one obtain a list of sequential numbers without creating tables?

SQL Server 2000 has always provided one answer. SQL Server 2005 provides more. This month's article will explain these capabilities in detail. We will first discuss the SQL Server 2000 solution and then move on to SQL Server 2005. A total of four new functions will be introduced. These functions, known collectively as ranking functions, provide a high level of control over exactly how the sequential numbers are produced. As we will see, this eases the work on the part of the SQL developer. I will wrap up the discussion by showing how sequential numbers form the basis of a result set paging solution, and will demonstrate the solution in both SQL Server 2000 and 2005.

IDENTITY Columns
As mentioned earlier, SQL Server 2000 provides IDENTITY columns. Tables that use these columns will contain a list of automatically generated sequential numbers. Because the IDENTITY seed and increment can be specified, the developer has a measure of control over the resulting list of numbers. However, in order to use IDENTITY columns, you have to use tables, which is not always desirable or efficient. So, how do you get around this requirement?

The answer is: you don't. You use a table-valued variable instead. Variables are almost always faster and more efficient than actual tables. Refer to the code in Listing 1. This code creates a table-valued variable called @OrderedRows that has three columns of actual data. The fourth column is an IDENTITY column, which is used to assign an order to that data. An example of the output from such a table-valued variable can be seen in Figure 1. Each row receives a number, starting with the seed number and increasing each time by the specified increment.

This technique has been available for years, and is still available in SQL Server 2005. Now however, we have more options available.

Ranking Functions
The four ranking functions, RANK, DENSE_RANK, ROW_NUMBER, and NTILE, are very similar to each other. In fact, their syntax is almost identical. As you can see in Listing 2, each function is followed by an OVER clause. This clause serves as a container for two subordinate clauses that do the actual work: the PARTITION clause and the ORDER BY clause. The PARTITION clause, which is optional, specifies the field(s) or expression(s) that is used to divide the result set into groups. The ORDER BY clause, which is required, specifies the sort order for each group.

The PARTITION clause provides us with a very interesting capability. Without it, the ranking functions will create sequential numbers across an entire result set. With it, however, we can specify groups or categories that should be numbered. The numbering will then restart with each group. In addition, when the PARTITION clause is used to create these groups, the ORDER BY clause applies to each. In order words, each resulting group will be individually sorted. Using these two clauses together, we can devise numbering schemes that are far more complex than the IDENTITY clause could ever produce. So, what does each individual ranking function do?

The RANK function assigns a rank number to each row of the result set or group. The DENSE_RANK function does this too. The difference is how the two functions handle ties. Remember that the rows of the result set or group are sorted according to the specified ORDER BY clause. It is quite possible for multiple rows to have identical values in the sort column(s). When this occurs, the tied rows will receive equal rank numbers - but how does the numbering continue afterward?

In Listing 3 I've created a query that lists some products and their associated categories. The rows inside each category are sorted by ascending list price. The rows are also assigned a rank via the RANK function. This can be seen in Figure 2. Notice that there are three products with the lowest price, therefore there are three rows with a rank of one. The next row has a rank of four, not two. Sometimes this is not what we want. We might prefer that the numbering resume with the next rank, which in this case would be two. The code in Listing 4 does just this. It uses the DENSE_RANK function, which produces the results shown in Figure 3.

The ROW_NUMBER function can be used when we want consecutive numbers and we don't care about ties. In these cases, no numbers should be repeated, and no numbers should be skipped. Listing 5 shows an example query, with the results shown in Figure 4. You can see that, within each category, the numbering is consecutive with no repeating values. The number does start over when a new category is reached, which is one of the main reasons that this function is different from an IDENTITY column. Remember that the PARTITION clause is optional, though, so we can use the ROW_NUMBER function to mimic an IDENTITY column. We would simply leave out the PARTITION clause. The advantage of this function is that you don't have to have a table in order to obtain your sequential numbers. The disadvantage is that the numbering will always be consecutive - the numbers will always start with one and will always increment by one. IDENTITY columns can have an increment greater than one and can start with a number other than one. (Pay special attention to this ranking function though, because we will use it in a real-world example later on.)

The final ranking function, NTILE, is a special case. The other three ranking functions produce sequential numbers across the entire result set or group. This function creates a specific number of subgroups, giving each row in the subgroup the same rank number. This can best be seen with an example. The query in Listing 6 produces the same product list that we've been using all along. This time, however, a parameter of 3 has been passed to the ranking function. The parameter specifies the number of subgroups that should be created. Examine the results in Figure 5. Notice that each product category has been divided into three subgroups and that each row of a subgroup has been given the same rank. Also notice that not all subgroups have the same number of rows. The NTILE function will divide the result set or group as evenly as possible, but some variance will occur when the number of rows in a group is not an exact multiple of the NTILE parameter.

Paging a Result Set
Let's see how a ranking function can be used to solve a real-world problem. Many applications need to display large lists of data. These lists are often far too large to display all at once, so a small page of data is shown instead. The program typically provides some method of retrieving the next page, a prior page, the first page, the last page, etc. This makes the application seem more responsive and user-friendly.

The key to this paging solution is the assignment of sequential numbers to the rows of the result set. These numbers can then be used to calculate which rows need to be included on a specific page. With SQL Server 2000, these numbers are typically created via an IDENTITY column. With SQL Server 2005, I prefer to use the ROW_NUMBER ranking function. This allows me to create sequential numbers without having to resort to the use of a table-valued variable.

Refer to Listing 7. I've created a stored procedure that accepts two parameters: a page number and a page size. This information will be used later to filter the rows that get sent back to the caller. The first SELECT statement returns the starting row number for the specified page, the ending row number for the specified page, and the total number of rows in the Product table. The application can use this information to determine if it is displaying the first page, the last page, or a page somewhere in the middle. Next, notice that I've used a nonrecursive Common Table Expression (CTE) to simplify the code. (Take special note of the semicolon at the end of the previous SELECT statement. When a CTE is used in a statement that is part of a batch, the statement before it must be followed by a semicolon. For more on CTEs, refer to the December edition of this column in DNDJ Vol. 3, iss. 10.) This CTE returns a couple of columns from the Product table, along with a row number created by the ROW_NUMBER ranking function. I have omitted the PARTITION clause, because I want a single set of numbers to be created across the entire Product table. The next SELECT statement returns the Product information from this CTE, but filters out those rows that do not correspond to the requested page. This ensures that only the desired number of rows is returned to the caller.

This solution returns two result sets: one that contains paging information and one that contains one page of actual data. Programmers using ADO.NET can access the second result set via a call to the NEXTRESULT method. If you don't wish to return the paging information, you can simply eliminate the first SELECT statement. In that case, the only result set that will be returned will be the one containing the Product data rows.

Now, we should test the stored procedure and verify the results. Listing 8 shows how to obtain the first page with a page size of 10. Figure 6 shows the output. Similarly, Listing 9 shows how to pull the second page, while Figure 7 shows that output. Finally, just to prove that the paging works as expected, the code in Listing 10 returns a 20-row page that contains the data from the previous two examples. Examine the results in Figure 8, and you can see that the paging is working properly.

Summary
We've seen how the four ranking functions in SQL Server 2005 can be used to create lists of sequential numbers. The numbering scheme can apply to an entire result set, or it can restart each time a new grouping is encountered. In the case of ties, we can allow some numbers to be "skipped," or we can require that they be strictly consecutive. We can also cause rows to be segregated into a specific number of groups, assigning identical numbers to the members of each group. These new capabilities provide much-needed flexibility to the T-SQL language, which makes things easier for the SQL Server developer.

More Stories By Jerry Dixon

Jerry Dixon is a senior developer and architect for ACH Food Companies in Memphis, Tennessee. Over the past 16 years he has led development projects for a number of enterprise, mid-level, and small business organizations. While he has fulfilled multiple roles as an infrastructure designer, database administrator, and software developer, he specializes in XML, SQL and ASP.NET. He is a co-leader and frequent presenter at the Memphis .NET User Group. Jerry holds the following Microsoft certifications: MCSD (VB 6.0 and .NET), MCDBA (SQL 2000), MCSA (Windows 2000 and 2003), MCSE (Windows 2000), MCAD (.NET), MCT. He resides in Olive Branch, MS with his wife and son.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Will 11/06/07 10:46:47 AM EST

Great article, thanks for tipping us to these new functions.

SYS-CON Belgium News Desk 02/19/06 04:46:22 PM EST

Sequential numbers. Sooner or later, you'll find a process that requires you to generate a series of sequential numbers. As a matter of fact, most database engines provide at least one method of producing such numbers. Microsoft Access, for example, has AutoNumber columns, while SQL Server has IDENTITY columns. However, these are features of tables. There are times when you need some sequential numbers, but creating a table is inconvenient or inefficient. So how does one proceed? How does one obtain a list of sequential numbers without creating tables?