Creating useful indexes is one of the most important ways to achieve better query performance. Useful indexes help you find data with fewer disk I/O operations and less system resource usage.
To create useful indexes, you much understand how the data is used, the types of queries and the frequencies they are run, and how the query processor can use indexes to find your data quickly.
When you choose what indexes to create, examine your critical queries, the performance of which will affect user experience most. Create indexes to specifically aid these queries. After adding an index, rerun the query to see if performance is improved. If it is not, remove the index.
As with most performance optimization techniques, there are tradeoffs. For example, with more indexes, SELECT queries will potentially run faster. However, DML (INSERT, UPDATE, and DELETE) operations will slow down significantly because more indexes must be maintained with each operation. Therefore, if your queries are mostly SELECT statements, more indexes can be helpful. If your application performs many DML operations, you should be conservative with the number of indexes you create.
SQL Server Compact 3.5 includes support for showplans, which help assess and optimize queries. SQL Server Compact 3.5 uses the same showplan schema as SQL Server 2008 except SQL Server Compact 3.5 uses a subset of the operators. For more information, see the Microsoft Showplan Schema at http://schemas.microsoft.com/sqlserver/2004/07/showplan/.
The next few sections provide additional information about creating useful indexes.
Indexing on columns used in the WHERE clause of your critical queries frequently improves performance. However, this depends on how selective the index is. Selectivity is the ratio of qualifying rows to total rows. If the ratio is low, the index is highly selective. It can get rid of most of the rows and greatly reduce the size of the result set. It is therefore a useful index to create. By contrast, an index that is not selective is not as useful.
A unique index has the greatest selectivity. Only one row can match, which makes it most helpful for queries that intend to return exactly one row. For example, an index on a unique ID column will help you find a particular row quickly.
You can evaluate the selectivity of an index by running the sp_show_statistics stored procedures on SQL Server Compact 3.5 tables. For example, if you are evaluating the selectivity of two columns, "Customer ID" and "Ship Via", you can run the following stored procedures:
sp_show_statistics_steps 'orders', 'customer id';
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS
------------------------------------------------------------
ALFKI 0 7 0
ANATR 0 4 0
ANTON 0 13 0
AROUT 0 14 0
BERGS 0 23 0
BLAUS 0 8 0
BLONP 0 14 0
BOLID 0 7 0
BONAP 0 19 0
BOTTM 0 20 0
BSBEV 0 12 0
CACTU 0 6 0
CENTC 0 3 0
CHOPS 0 12 0
COMMI 0 5 0
CONSH 0 4 0
DRACD 0 9 0
DUMON 0 8 0
EASTC 0 13 0
ERNSH 0 33 0
(90 rows affected)
And
sp_show_statistics_steps 'orders', 'reference3';
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS
------------------------------------------------------------
1 0 320 0
2 0 425 0
3 0 333 0
(3 rows affected)
The results show that the "Customer ID" column has a much lower degree of duplication. This means an index on it will be more selective than an index on the "Ship Via" column.
For more information about using these stored procedures, see sp_show_statistics (SQL Server Compact 3.5), sp_show_statistics_steps (SQL Server Compact 3.5), and sp_show_statistics_columns (SQL Server Compact).
Multiple-column indexes are natural extensions of single-column indexes. Multiple-column indexes are useful for evaluating filter expressions that match a prefix set of key columns. For example, the composite index CREATE INDEX Idx_Emp_Name ON Employees ("Last Name" ASC, "First Name" ASC)
helps evaluate the following queries:
However, it is not useful for this query:
When you create a multiple-column index, you should put the most selective columns leftmost in the key. This makes the index more selective when matching several expressions.
A small table is one whose contents fit in one or just a few data pages. Avoid indexing very small tables because it is typically more efficient to do a table scan. This saves the cost of loading and processing index pages. By not creating an index on very small tables, you remove the chance of the optimizer selecting one.
SQL Server Compact 3.5 stores data in 4 Kb pages. The page count can be approximated by using the following formula, although the actual count might be slightly larger because of the storage engine overhead.
<sum of sizes of columns in bytes> * <# of rows>
<# of pages> = -----------------------------------------------------------------
4096
For example, suppose a table has the following schema:
Column Name | Type (size) |
---|---|
Order ID |
INTEGER (4 bytes) |
Product ID |
INTEGER (4 bytes) |
Unit Price |
MONEY (8 bytes) |
Quantity |
SMALLINT (2 bytes) |
Discount |
REAL (4 bytes) |
The table has 2820 rows. According to the formula, it takes about 16 pages to store its data:
<# of pages> = ((4 + 4 + 8 + 2 + 4) * 2820) / 4096 = 15.15 pages
We recommend that you always create indexes on primary keys. It is frequently useful to also create indexes on foreign keys. This is because primary keys and foreign keys are frequently used to join tables. Indexes on these keys lets the optimizer consider more efficient index join algorithms. If your query joins tables by using other columns, it is frequently helpful to create indexes on those columns for the same reason.
When primary key and foreign key constraints are created, SQL Server Compact 3.5 automatically creates indexes for them and takes advantage of them when optimizing queries. Remember to keep primary keys and foreign keys small. Joins run faster this way.
Indexes can be used to speed up the evaluation of certain types of filter clauses. Although all filter clauses reduce the final result set of a query, some can also help reduce the amount of data that must be scanned.
A search argument (SARG) limits a search because it specifies an exact match, a range of values, or a conjunction of two or more items joined by AND. It has one of the following forms:
SARG operators include =, >, <, >=, <=, IN, BETWEEN, and sometimes LIKE (in cases of prefix matching, such as LIKE 'John%'). A SARG can include multiple conditions joined with an AND. SARGs can be queries that match a specific value, such as:
SARGs can also be queries that match a range of values, such as:
An expression that does not use SARG operators does not improve performance, because the SQL Server Compact 3.5 query processor has to evaluate every row to determine whether it meets the filter clause. Therefore, an index is not useful on expressions that do not use SARG operators. Non-SARG operators include NOT, <>, NOT EXISTS, NOT IN, NOT LIKE, and intrinsic functions.
When determining the access methods for base tables, the SQL Server Compact 3.5 optimizer determines whether an index exists for a SARG clause. If an index exists, the optimizer evaluates the index by calculating how many rows are returned. It then estimates the cost of finding the qualifying rows by using the index. It will choose indexed access if it has lower cost than table scan. An index is potentially useful if its first column or prefix set of columns are used in the SARG, and the SARG establishes a lower bound, upper bound, or both, to limit the search.
Response time is the time it takes for a query to return the first record. Total time is the time it takes for the query to return all records. For an interactive application, response time is important because it is the perceived time for the user to receive visual affirmation that a query is being processed. For a batch application, total time reflects the overall throughput. You have to determine what the performance criteria are for your application and queries, and then design accordingly.
For example, suppose the query returns 100 records and is used to populate a list with the first five records. In this case, you are not concerned with how long it takes to return all 100 records. Instead, you want the query to return the first few records quickly, so that you can populate the list.
Many query operations can be performed without having to store intermediate results. These operations are said to be pipelined. Examples of pipelined operations are projections, selections, and joins. Queries implemented with these operations can return results immediately. Other operations, such as SORT and GROUP-BY, require using all their input before returning results to their parent operations. These operations are said to require materialization. Queries implemented with these operations typically have an initial delay because of materialization. After this initial delay, they typically return records very quickly.
Queries with response time requirements should avoid materialization. For example, using an index to implement ORDER-BY yields better response time than does using sorting. The following section describes this in more detail.
The ORDER-BY, GROUP-BY, and DISTINCT operations are all types of sorting. The SQL Server Compact 3.5 query processor implements sorting in two ways. If records are already sorted by an index, the processor needs to use only the index. Otherwise, the processor has to use a temporary work table to sort the records first. Such preliminary sorting can cause significant initial delays on devices with lower power CPUs and limited memory, and should be avoided if response time is important.
In the context of multiple-column indexes, for ORDER-BY or GROUP-BY to consider a particular index, the ORDER-BY or GROUP-BY columns must match the prefix set of index columns with the exact order. For example, the index CREATE INDEX Emp_Name ON Employees ("Last Name" ASC, "First Name" ASC)
can help optimize the following queries:
It will not help optimize:
For a DISTINCT operation to consider a multiple-column index, the projection list must match all index columns, although they do not have to be in the exact order. The previous index can help optimize the following queries:
It will not help optimize:
Note: |
---|
If your query always returns unique rows on its own, avoid specifying the DISTINCT keyword, because it only adds overhead
|
Sometimes you can rewrite a subquery to use JOIN and achieve better performance. The advantage of creating a JOIN is that you can evaluate tables in a different order from that defined by the query. The advantage of using a subquery is that it is frequently not necessary to scan all rows from the subquery to evaluate the subquery expression. For example, an EXISTS subquery can return TRUE upon seeing the first qualifying row.
Note: |
---|
The SQL Server Compact 3.5 query processor always rewrites the IN subquery to use JOIN. You do not have to try this approach with queries that contain the IN subquery clause. |
For example, to determine all the orders that have at least one item with a 25 percent discount or more, you can use the following EXISTS subquery:
SELECT "Order ID" FROM Orders O
WHERE EXISTS (SELECT "Order ID"
FROM "Order Details" OD
WHERE O."Order ID" = OD."Order ID"
AND Discount >= 0.25)
You can also rewrite this by using JOIN:
SELECT DISTINCT O."Order ID" FROM Orders O INNER JOIN "Order Details"
OD ON O."Order ID" = OD."Order ID" WHERE Discount >= 0.25
OUTER JOINs are treated differently from INNER JOINs in that the optimizer does not try to rearrange the join order of OUTER JOIN tables as it does to INNER JOIN tables. The outer table (the left table in LEFT OUTER JOIN and the right table in RIGHT OUTER JOIN) is accessed first, followed by the inner table. This fixed join order could lead to execution plans that are less than optimal.
For more information about a query that contains INNER JOIN, see Microsoft Knowledge Base.
If your application runs a series of queries that are only different in some constants, you can improve performance by using a parameterized query. For example, to return orders by different customers, you can run the following query:
SELECT "Customer ID" FROM Orders WHERE "Order ID" = ?
Parameterized queries yield better performance by compiling the query only once and executing the compiled plan multiple times. Programmatically, you must hold on to the command object that contains the cached query plan. Destroying the previous command object and creating a new one destroys the cached plan. This requires the query to be re-compiled. If you must run several parameterized queries in interleaved manner, you can create several command objects, each caching the execution plan for a parameterized query. This way, you effectively avoid re-compilations for all of them.
The SQL Server Compact 3.5 query processor is a powerful tool for querying data stored in your relational database. However, there is an intrinsic cost associated with any query processor. It must compile, optimize, and generate an execution plan before it starts doing the real work of performing the plan. This is particularly true with simple queries that finish quickly. Therefore, implementing the query yourself can sometimes provide vast performance improvement. If every millisecond counts in your critical component, we recommend that you consider the alternative of implementing the simple queries yourself. For large and complex queries, the job is still best left to the query processor.
For example, suppose you want to look up the customer ID for a series of orders arranged by their order IDs. There are two ways to accomplish this. First, you could follow these steps for each lookup:
Or you could issue the following query for each lookup:
SELECT "Customer ID" FROM Orders WHERE "Order ID" = <the specific order id>
The query-based solution is simpler but slower than the manual solution, because the SQL Server Compact 3.5 query processor translates the declarative SQL statement into the same three operations that you could implement manually. Those three steps are then performed in sequence. Your choice of which method to use will depend on whether simplicity or performance is more important in your application.