Indexes and Query Speed (from hsqldb document)

HSQLDB does not use indexes to improve sorting of query results. But indexes have a crucial role in improving query speed. If no index is used in a query on a single table, such as a DELETE query, then all the rows of the table must be examined. With an index on one of the columns that is in the WHERE clause, it is often possible to start directly from the first candidate row and reduce the number of rows that are examined.

Indexes are even more important in joins between multiple tables. SELECT ... FROM t1 JOIN t2 ON t1.c1 = t2.c2 is performed by taking rows of t1 one by one and finding a matching row in t2. If there is no index index on t2.c2 then for each row of t1, all the rows of t2 must be checked. Whereas with an index, a matching row can be found in a fraction of the time. If the query also has a condition on t1, e.g., SELECT ... FROM t1 JOIN t2 ON t1.c1 = t2.c2 WHERE t1.c3 = 4 then an index on t1.c3 would eliminate the need for checking all the rows of t1 one by one, and will reduce query time to less than a millisecond per returned row. So if t1 and t2 each contain 10,000 rows, the query without indexes involves checking 100,000,000 row combinations. With an index on t2.c2, this is reduced to 10,000 row checks and index lookups. With the additional index on t2.c2, only about 4 rows are checked to get the first result row.

Indexes are automatically created for primary key and unique columns. Otherwise you should define an index using the CREATE INDEX command.

Note that in HSQLDB a unique index on multiple columns can be used internally as a non-unique index on the first column in the list. For example: CONSTRAINT name1 UNIQUE (c1, c2, c3); means there is the equivalent of CREATE INDEX name2 ON atable(c1);. So you do not need to specify an extra index if you require one on the first column of the list.

In 1.8.0, a multi-column index will speed up queries that contain joins or values on ALL the columns. You need NOT declare additional individual indexes on those columns unless you use queries that search only on a subset of the columns. For example, rows of a table that has a PRIMARY KEY or UNIQUE constraint on three columns or simply an ordinary index on those columns can be found efficiently when values for all three columns are specified in the WHERE clause. For example, SELECT ... FROM t1 WHERE t1.c1 = 4 AND t1.c2 = 6 AND t1.c3 = 8 will use an index on t1(c1,c2,c3) if it exists.

As a result of the improvements to multiple key indexes, the order of declared columns of the index or constraint has less affect on the speed of searches than before. If the column that contains more diverse values appears first, the searches will be slightly faster.

A multi-column index will not speed up queries on the second or third column only. The first column must be specified in the JOIN .. ON or WHERE conditions.

Query speed depends a lot on the order of the tables in the JOIN .. ON or FROM clauses. For example the second query below should be faster with large tables (provided there is an index on TB.COL3). The reason is that TB.COL3 can be evaluated very quickly if it applies to the first table (and there is an index on TB.COL3):

    (TB is a very large table with only a few rows where TB.COL3 = 4)

    SELECT * FROM TA JOIN TB ON TA.COL1 = TB.COL2 AND TB.COL3 = 4;

    SELECT * FROM TB JOIN TA ON TA.COL1 = TB.COL2 AND TB.COL3 = 4;

The general rule is to put first the table that has a narrowing condition on one of its columns.

1.7.3 features automatic, on-the-fly indexes for views and subselects that are used in a query. An index is added to a view when it is joined to a table or another view.
Where Condition or Join

Using WHERE conditions to join tables is likely to reduce execution speed. For example the following query will generally be slow, even with indexes:

    SELECT ... FROM TA, TB, TC WHERE TC.COL3 = TA.COL1 AND TC.COL3=TB.COL2 AND TC.COL4 = 1

The query implies TA.COL1 = TB.COL2 but does not explicitly set this condition. If TA and TB each contain 100 rows, 10000 combinations will be joined with TC to apply the column conditions, even though there may be indexes on the joined columns. With the JOIN keyword, the TA.COL1 = TB.COL2 condition has to be explicit and will narrow down the combination of TA and TB rows before they are joined with TC, resulting in much faster execution with larger tables:

    SELECT ... FROM TA JOIN TB ON TA.COL1 = TB.COL2 JOIN TC ON TB.COL2 = TC.COL3 WHERE TC.COL4 = 1

The query can be speeded up a lot more if the order of tables in joins are changed, so that TC.COL1 = 1 is applied first and a smaller set of rows are joined together:

    SELECT ... FROM TC JOIN TB ON TC.COL3 = TB.COL2 JOIN TA ON TC.COL3 = TA.COL1 WHERE TC.COL4 = 1

In the above example the engine automatically applies TC.COL4 = 1 to TC and joins only the set of rows that satisfy this condition with other tables. Indexes on TC.COL4, TB.COL2 and TA.COL1 will be used if present and will speed up the query.
Subqueries and Joins

Using joins and setting up the order of tables for maximum performance applies to all areas. For example, the second query below should generally be much faster if there are indexes on TA.COL1 and TB.COL3:

Example 2.2. Query comparison

    SELECT ... FROM TA WHERE TA.COL1 = (SELECT MAX(TB.COL2) FROM TB WHERE TB.COL3 = 4)

    SELECT ... FROM (SELECT MAX(TB.COL2) C1 FROM TB WHERE TB.COL3 = 4) T2 JOIN TA ON TA.COL1 = T2.C1

The second query turns MAX(TB.COL2) into a single row table then joins it with TA. With an index on TA.COL1, this will be very fast. The first query will test each row in TA and evaluate MAX(TB.COL2) again and again.

你可能感兴趣的:(UP,HSQLDB,performance)