第一步: 大数据集合中的高效索引
1. 对过滤条件进行索引(解决where查询)
2. 使用相同的索引返回排序结果(解决order的效率)
第二步:从零开始的相关资料
复合索引:单一索引是指索引列为一列的情况,即新建索引的语句只实施在一列上。
用户可以在多个列上建立索引,这种索引叫做复合索引(组合索引)。复合索引的创建方法与创建单一索引的方法完全一样。但复合索引在数据库操作期间所需的开销更小,可以代替多个单一索引。当表的行数远远大于索引键的数目时,使用这种方式可以明显加快表的查询速度。同时有两个概念叫做窄索引和宽索引,窄索引是指索引列为1-2列的索引,如果不特殊说明的话一般是指单一索引。宽索引也就是索引列超过2列的索引。设计索引的一个重要原则就是能用窄索引不用宽索引,因为窄索引往往比组合索引更有效。拥有更多的窄索引,将给优化程序提供更多的选择余地,这通常有助于提高性能。
摘自:http://www.cnblogs.com/wenly/articles/1240321.html
KEY a_b_c (a, b, c) //创建a,b,c的复合索引
索引可能会对ORDER起作用:
– ORDER BY a
– ORDER BY a,b
– ORDER BY a, b, c
– ORDER BY a DESC, b DESC, c DESC
索引对WHERE和ORDER都起作用:
– WHERE a = const ORDER BY b, c
– WHERE a = const AND b = const ORDER BY c
– WHERE a = const ORDER BY b, c
– WHERE a = const AND b > const ORDER BY b, c
索引在ORDER失效:
– ORDER BY a ASC, b DESC, c DESC /* 混合的ORDER方向 */
– WHERE g = const ORDER BY b, c /* 前面的字段缺失 */
– WHERE a = const ORDER BY c /* b缺失 */
– WHERE a = const ORDER BY a, d /* d 不在索引中 */
总结:复合索引的使用必须保持顺序,并且排序方向一致;只允许后部缺失
CREATE TABLE `message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`user_id` int(11) NOT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`create_time` int(11) NOT NULL,
`thumbs_up` int(11) NOT NULL DEFAULT '0', /* Vote Count */
PRIMARY KEY (`id`),
KEY `thumbs_up_key` (`thumbs_up`,`id`)
) ENGINE=InnoDB
mysql> show table status like 'message' \G
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 50000040 /*50 Million */
Avg_row_length: 565
Data_length: 28273803264 /* 26 GB*/
Index_length: 789577728 /* 753 MB*/
Data_free: 6291456
Create_time: 2009-04-20 13:30:45
两个需求:
SELECT count(*) FROM message
SELECT * FROM message ORDER BY id DESC LIMIT 0, 20
注意:id自增 ,createtime也是增长的,为了节省空间没必要给createtime建立索引
mysql> explain SELECT * FROM message
ORDER BY id DESC
LIMIT 10000, 20\G
***************** 1. row **************
id: 1
select_type: SIMPLE
table: message
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 10020
Extra:
1 row in set (0.00 sec)
- 通过对id进行扫描,并且在找到了需要的行后会停止
- LIMIT 10000, 20 意味着需要读取10020行并且抛弃10000行,然后返回20行,效率当然很低了
WHERE id < 100 ORDER BY id DESC LIMIT $page_size
/ No OFFSET/WHERE id > 98 ORDER BY id ASC LIMIT $page_size
/ No OFFSET/mysql> explain
SELECT * FROM message
WHERE id < '49999961'
ORDER BY id DESC LIMIT 20 \G
*************************** 1. row ***********************
id: 1
select_type: SIMPLE
table: message
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
Rows: 25000020 /* ignore this */
Extra: Using where
1 row in set (0.00 sec)
WHERE thumbs_up <98 ORDER BY thumbs_up DESC
/ 会返回较多的结果 / WHERE thumbs_up <=98 AND <额外的条件> ORDER BY thumbs_up DESC
第一页:
SELECT thumbs_up, id FROM message ORDER BY thumbs_up DESC, id DESC LIMIT $page_size
+-----------+----+
| thumbs_up | id |
+-----------+----+
| 99 | 14 |
| 99 | 2 |
| 98 | 18 |
| 98 | 15 |
| 98 | 13 |
+-----------+----+
下一页:
SELECT thumbs_up, id FROM message WHERE thumbs_up<= 98 AND (id < 13 OR thumbs_up < 98) ORDER BY thumbs_up DESC, id DESC LIMIT $page_size
+-----------+----+
| thumbs_up | id |
+-----------+----+
| 98 | 10 |
| 98 | 6 |
| 97 | 17 |
查询:
SELECT * FROM message WHERE thumbs_up <= 98 AND (id < 13 OR thumbs_up < 98) ORDER BY thumbs_up DESC, id DESC LIMIT 20
可以这样写:
SELECT m2.* FROM message m1, message m2 WHERE m1.id = m2.id AND m1.thumbs_up <= 98 AND (m1.id < 13 OR m1.thumbs_up < 98) ORDER BY m1.thumbs_up DESC, m1.id DESC LIMIT 20;
Explain:
id: 1
elect_type: SIMPLE
table: m1
type: range
sible_keys: PRIMARY,thumbs_up_key
key: thumbs_up_key /* (thumbs_up,id) */
key_len: 4
ref: NULL
Rows: 25000020 /*ignore this, we will read just 20 rows*/
Extra: Using where; Using index /* Cover */
************************ 2. row ***************************
id: 1
elect_type: SIMPLE
table: m2
type: eq_ref
sible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: forum.m1.id
rows: 1
Extra:
由雅虎2009年分享的资料整理 Efficient Pagination Using MySQL 整理而成,另参考 http://www.fuchaoqun.com/2009/04/efficient-pagination-using-mysql/#comment-156