A More Efficient Method for Paging Through Large Result Sets

引自:http://www.4guysfromrolla.com/webtech/042606-1.shtml

 

Introduction
It is amazing the amount of cycles, hardware and brain ware, go into paging results efficiently. Recently Scott Mitchell authored an article titled Efficiently Paging Through Large Result Sets in SQL Server 2000 that looked at a stored procedure that returned a particular "page" of data from a table. After examining Scott's approach, I saw some potential improvements in his method. (If you haven't yet perused Scott's technique, take a moment to do so before continuing here, as this article builds upon Scott's example.)

Scott's approach made use of a table variable to generate a synthetic ID to act as a row counter. Every time a page is requested, all of the data in the table being paged must be read and inserted into the table variable in order to generate the synthetic ID, at which point a SELECT statement returns just those records whose IDs fall within the desired range. While Scott's method is faster than blindly returning all of the records, his approach can be greatly improved by using ROWCOUNT to greatly reduce the number of records that must be read and inserted into the table variable.

In this article we'll look at two ways to improve Scott's method. The first approach uses a table variable (just like Scott's), but utilizes the SET ROWCOUNT command to reduce the number of records read and inserted into the table variable. The second technique more cleverly uses SET ROWCOUNT to provide an even more efficient approach than the first. Read on to learn more!

Using ROWCOUNT to Optimize Paging
The first step we can take to optimize paging is to use SET ROWCOUNT prior to filling our table variable. SET options alter the current sessions handling of specific behavior; SET ROWCOUNT tells SQL Server to stop processing query results after it has processed the specified number of rows. For more background on SET ROWCOUNT, refer to Retrieving the First N Records from a SQL Query.

This particular stored procedure example was created by 4Guys reader Dave Griffiths.

CREATE PROCEDURE usp_PagedResults_New_Rowcount
(
   @startRowIndex int,
   @maximumRows int
)
AS
 
DECLARE @TempItems TABLE
(
   ID int IDENTITY,
   EmployeeID int
)
 
DECLARE @maxRow int

-- A check can be added to make sure @startRowIndex isn't > count(1)
-- from employees before doing any actual work unless it is guaranteed
-- the caller won't do that

SET @maxRow = (@startRowIndex + @maximumRows) - 1

SET ROWCOUNT @maxRow

INSERT INTO @TempItems (EmployeeID)
SELECT EmployeeID
FROM Employees
ORDER BY EmployeeID

SET ROWCOUNT @maximumRows

SELECT e.*, d.[Name] as DepartmentName
FROM @TempItems t
   INNER JOIN Employees e ON
       e.EmployeeID = t.EmployeeID
   INNER JOIN Departments d ON
       d.DepartmentID = e.DepartmentID
WHERE ID >= @startRowIndex

SET ROWCOUNT 0

GO

As you can see, this stored procedure uses SET ROWCOUNT prior to filling the table variable with what is already known will be the last row needed to satisfy the current page. SQL Server stops filling the table once it processes this number of rows, minimizing the amount of data pulled into the table variable. This method rocks for the first several pages and only begins to hit SQL Server resources as we get incrementally deeper into our pages. Last but very important, SET ROWCOUNT 0. This turns off row limitation and puts the current session in the default behavior mode in case the caller is doing anything else interesting with the same connection that may require more rows returned.

Taking Advantage of the SQL Server Optimizer
An optimizer trick that can also be used in this scenario is when a single variable is set to a potential list of values, it will get assigned the value of the last item in the list. For example, the following SQL script simply creates and fills a table variable with 100 records (1 through 100), then selects the value of the val column into a local variable from the entire table using two different sorts:

DECLARE @tmp TABLE(
   val int
)

DECLARE @i int, @cnt int, @res int
SELECT @i = 0, @cnt = 100
WHILE @i < @cnt
BEGIN
    SELECT @i = @i + 1
    INSERT INTO @tmp VALUES(@i)
    
end

SELECT @res = val FROM @tmp ORDER BY val ASC
SELECT @res [Value], 'ASC Sort' [Sort]

SELECT @res = val FROM @tmp ORDER BY val DESC
SELECT @res [Value], 'DESC Sort' [Sort]

The results from these selects follow:

Value    Sort
-----    ----
100    ASC Sort

Value    Sort
-----    ----
1    DESC Sort

While one might think that SQL Server will need to read every record from the @tmp table and assign the val column to @res for each such record, looking at the query plan for the statement it becomes obvious that the SQL Server optimizer knows that it will only ever need a single row to complete the query and is able to just read in that particular record. Examining this operation in SQL Profiler you'll find that the optimizer is able to get to the end result in only six reads (as opposed to the 100+ reads that would be necessary if it was reading every single record in the table). In short, when presented with such a query SQL Server doesn't actually get all of the records from the table, one at a time, and assign them to the local variable. Rather, it searches just for the last record in the query and assigns that result to the variable.

So, how does this little trick help in the problem of paging large result sets? If this knowledge is combined with SET ROWCOUNT, large result sets can be efficiently paged without the need for temporary tables or table variables! Here is another, more efficient version of Scott and David's stored procedures:

CREATE  PROCEDURE [dbo].[usp_PageResults_NAI]
(
    @startRowIndex int,
    @maximumRows int
)
AS

DECLARE @first_id int, @startRow int
    
-- A check can be added to make sure @startRowIndex isn't > count(1)
-- from employees before doing any actual work unless it is guaranteed
-- the caller won't do that

-- Get the first employeeID for our page of records
SET ROWCOUNT @startRowIndex
SELECT @first_id = employeeID FROM employees ORDER BY employeeid

-- Now, set the row count to MaximumRows and get
-- all records >= @first_id
SET ROWCOUNT @maximumRows

SELECT e.*, d.name as DepartmentName
FROM employees e
   INNER JOIN Departments D ON
       e.DepartmentID = d.DepartmentID
WHERE employeeid >= @first_id
ORDER BY e.EmployeeID

SET ROWCOUNT 0

GO

Using optimizer knowledge and SET ROWCOUNT, the first EmployeeID in the page that is requested is stored in a local variable for a starting point. Next, SET ROWCOUNT to the maximum number of records that is requested in @maximumRows. This allows paging the result set in a much more efficient manner. Using this method also takes advantage of pre-existing indexes on the table as it goes directly to the base table and not to a locally created table.

Using an even more highly unscientific comparison method than Scott's, let's see the results:

     Rows in Table    Page #     CPU    Reads    Duration
Scott's Approach
(Table Variable
w/o SET ROWCOUNT)    50,000    1    407    51,469    438
50,000    10    375    51,469    422
50,000    100    406    51,469    485
50,000    1,000    359    51,469    422
50,000    10,000    422    51,469    468
100,000    50,000    734    102,829    797
150,000    100,000    1,109    154,170    1,219
David's Approach
Table Variable
with SET ROWCOUNT)     50,000    1    0    80    0
50,000    10    0    90    0
50,000    100    0    182    15
50,000    1,000    0    1,133    16
50,000    10,000    47    10,422    110
100,000    50,000    313    51,501    391
150,000    100,000    688    102,867    750
My Approach     50,000    1    0    24    0
50,000    10    0    24    0
50,000    100    0    26    0
50,000    1,000    0    24    0
50,000    10,000    16    125    15
100,000    50,000    31    525    63
150,000    100,000    31    1,023    78

Impressed? If your application is like 99.99% of application using paging and you know no one will ever make it to the 1000th page, you might not think using knowledge of the optimizer is not very important, but if you talk to just about anyone who has a highly loaded database, they will tell you space and blocking in tempdb is a problem. (Anytime you write to a temporary table or table variable, you're working with tempdb.) Anything that can be done to minimize the use of tempdb should be used – it will speed up your application. Also, the reduction in IO through a 30% reduction in reads over using SET ROWCOUNT alone for the first page alone is significant.

Conclusion
As you can see, paging can be greatly improved with the use of SET ROWCOUNT and perhaps a little knowledge of the optimizer. SET ROWCOUNT can be used with a table variable and allow different sorting parameters easily. Additionally, we could still allow ordering of the pages with different sorting options through the use of dynamic SQL (let us know if you'd like to see how to do these things in another article) and still use the optimizer to our advantage, but this can get very complex in the case of ties.

Happy Programming!
 

 

你可能感兴趣的:(method)