MySQL InnoDB 聚簇索引 非聚簇索引 二级索引 普通索引定义

table

Each MySQL table is associated with a particular storage engine. InnoDB tables have particular physical and logical characteristics that affect performance, scalability, backup, administration, and application development.
In terms of file storage, an InnoDB table belongs to one of the following tablespace types:

  • The shared InnoDB system tablespace, which is comprised of one or more ibdata files.

  • A file-per-table tablespace, comprised of an individual .ibd file.

  • A shared general tablespace, comprised of an individual .ibd file. General tablespaces were introduced in MySQL 5.7.6.

.ibd data files contain both table and index data.
InnoDB tables created in file-per-table tablespaces can use DYNAMIC or COMPRESSED row format. These row formats enable InnoDB features such as compression, efficient storage of off-page columns, and large index key prefixes. General tablespaces support all row formats.
The system tablespace supports tables that use REDUNDANT, COMPACT, and DYNAMIC row formats. System tablespace support for the DYNAMIC row format was added in MySQL 5.7.6.
The rows of an InnoDB table are organized into an index structure known as the clustered index, with entries sorted based on the primary key columns of the table. Data access is optimized for queries that filter and sort on the primary key columns, and each index contains a copy of the associated primary key columns for each entry. Modifying values for any of the primary key columns is an expensive operation. Thus an important aspect of InnoDB table design is choosing a primary key with columns that are used in the most important queries, and keeping the primary key short, with rarely changing values.

index

A data structure that provides a fast lookup capability for rows of a table, typically by forming a tree structure (B-tree) representing all the values of a particular column or set of columns.
InnoDB tables always have a clustered index representing the primary key. They can also have one or more secondary indexes defined on one or more columns. Depending on their structure, secondary indexes can be classified as partial, column, or composite indexes.
Indexes are a crucial aspect of query performance. Database architects design tables, queries, and indexes to allow fast lookups for data needed by applications. The ideal database design uses a covering index where practical; the query results are computed entirely from the index, without reading the actual table data. Each foreign key constraint also requires an index, to efficiently check whether values exist in both the parent and child tables.
Although a B-tree index is the most common, a different kind of data structure is used for hash indexes, as in the MEMORY storage engine and the InnoDB adaptive hash index. R-tree indexes are used for spatial indexing of multi-dimensional information.

clustered index

The InnoDB term for a primary key index. InnoDB table storage is organized based on the values of the primary key columns, to speed up queries and sorts involving the primary key columns. For best performance, choose the primary key columns carefully based on the most performance-critical queries. Because modifying the columns of the clustered index is an expensive operation, choose primary columns that are rarely or never updated.
In the Oracle Database product, this type of table is known as an index-organized table.

primary key

A set of columns—and by implication, the index based on this set of columns—that can uniquely identify every row in a table. As such, it must be a unique index that does not contain any NULL values.
InnoDB requires that every table has such an index (also called the clustered index or cluster index), and organizes the table storage based on the column values of the primary key.
When choosing primary key values, consider using arbitrary values (a synthetic key) rather than relying on values derived from some other source (a natural key).

secondary index

A type of InnoDB index that represents a subset of table columns. An InnoDB table can have zero, one, or many secondary indexes. (Contrast with the clustered index, which is required for each InnoDB table, and stores the data for all the table columns.)
A secondary index can be used to satisfy queries that only require values from the indexed columns. For more complex queries, it can be used to identify the relevant rows in the table, which are then retrieved through lookups using the clustered index.
Creating and dropping secondary indexes has traditionally involved significant overhead from copying all the data in the InnoDB table. The fast index creation feature makes both CREATE INDEX and DROP INDEX statements much faster for InnoDB secondary indexes.

总结:

  1. MySQL InnoDB表,是索引组织表,表中的所有数据行都放在索引上,这就约定了数据是按照聚簇索引(主键索引)顺序存放的,因此不管记录的插入先后顺序,它在物理页上的位置与插入的先后顺序无关,与聚簇索引相关;
  2. MySQL InnoDB表,聚簇索引就是主键索引,一个InnoDB表只能有1个(有且只有)聚簇索引,数据行和相邻的键值紧凑地存储在一起,不可能有两个聚簇索引;
  3. 聚集索引,叶子节点存的是整行数据,直接通过这个聚集索引的键值找到某行;
  4. 聚集索引,数据的物理存放顺序与索引顺序是一致的,即:只要索引是相邻的,那么对应的数据一定也是相邻地存放在磁盘扇区上;
  5. MySQL InnoDB表,非聚集索引就是二级索引,所有普通索引(非聚簇)都是二级索引;
  6. MySQL InnoDB二级索引的叶子节点存的是主键字段的值,通过这个非聚集索引的键值找到对应的聚集索引字段的值,再通过聚集索引键值找到表的某行,类似oracle通过键值找到rowid,再通过rowid找到行;
  7. MySQL InnoDB表,其聚集索引相当于整张表,而整张表也是聚集索引。默认通过主键聚集数据,如果没有定义主键,则选择第一个非空的唯一索引,如果没有非空唯一索引,则选择rowid来作为聚集索引
  8. MySQL InnoDB表,因为整张表也是聚集索引,select出来的结果是顺序排序的,比如主键字段的数据插入顺序可以是5、3、4、2、1,查询时不带order by得出的结果也是按1、2、3、4、5排序
  9. 通俗理解
    聚集索引:类似新华字典正文内容本身就是一种按照一定规则排列的目录
    非聚集索引:这种目录纯粹是目录,正文纯粹是正文的排序方式
    每个表只能有一个聚集索引,因为目录只能按照一种方法进行排序 。
    9、oracle一般使用堆表,mysql的innodb是索引组织表
    9.1、堆表以一种显然随机的方式管理,数据插入时时存储位置是随机的,主要是数据库内部块的空闲情况决定,数据会放在最合适的地方,而不是以某种特定顺序来放置。
    9.2、堆表的存储速度因为不用考虑排序, 所以存储速度会比较快. 但是要查找符合某个条件的记录, 就必须得读取全部的记录以便筛选。
    9.3、堆表其索引中记录了记录所在位置的rowid,查找的时候先找索引,然后再根据索引rowid找到块中的行数据。
    9.4、堆表的索引和表数据是分离的
    9.5、索引组织表,其行数据以索引形式存放,因此找到索引,就等于找到了行数据。
    9.6、索引组织表索引和数据是在一起的

举例

mysql> create table T( id int primary key,k int not null,name varchar(16),index (k))engine=InnoDB;
(ID,k) 值分别为 (100,1)、(200,2)、(300,3)、(500,5) 和 (600,6)

  • 主键索引的叶子节点存的是整行数据。在 InnoDB 里,主键索引也被称为聚集索引(clustered index)。

  • 非主键索引的叶子节点内容是主键的值。在 InnoDB 里,非主键索引也被称为二级索引(secondary index)。

  • 如果语句是 select * from T where ID=500,即 主键查询方式,则只需要搜索 ID 这棵 B+树 ;

  • 如果语句是 select * from T where k=5,即 普通索引查询方式,则需要先搜索 k 索引树,得到 ID的值为 500,再到 ID 索引树搜索一次。这个过程称为回表。

  • B+ 树为了维护索引有序性,在插入新值的时候需要做必要的维护。以上面为例,如果插入新的行 ID 值为 700,则只只需要在 R5 的记录后面插入一个新记录。如果新插入的 ID值为 400,就相对麻烦了,需要逻辑上挪动后面的数据,空出位置。

[参考文档]
https://dev.mysql.com/doc/refman/8.0/en/glossary.html#glos_table
https://dev.mysql.com/doc/refman/8.0/en/glossary.html#glos_index https://dev.mysql.com/doc/refman/8.0/en/glossary.html#glos_clustered_index
https://dev.mysql.com/doc/refman/8.0/en/glossary.html#glos_primary_key
https://dev.mysql.com/doc/refman/8.0/en/glossary.html#glos_secondary_index
http://blog.itpub.net/30126024/viewspace-2221485/

你可能感兴趣的:(MySQL)