说几句题外话,在京被困三个月之久,不能回家,所以这个源码分析就中断了。之所以在家搞这个数据库的源码分析,主要是在家环境齐全,公司的电脑老旧不堪。意外事件往往打断正常的习惯和运行轨迹,但这却是正常现象。回来也有两周,从本周开始恢复这个源码分析的系列。
大德久远,有始有终!
什么是索引?索引有什么作用?还记得上小学时,老是教使用字典么?如果一个字不认识或者知道读音但字儿不会写都可以通过拼音或者笔画直接定位到某个字,然后它的后面就是这个字的具体的页码位置。翻到这个页码,就可以看到这个字了。在这个页面上,有这个字的写法,读音以及相关的语义,甚至还有一些常用的词语的解释和造句应用举例。
两样的道理,在图书馆找一本书,也是类似的方法;在KTV点歌,也是类似的方式,只是它们随着计算机的普及,变得更简单,更容易找到。索引,原始的意思就是查找图书的一种工具。关系型数据库技术将这种工具转用过来,把它定义为“一种单独的、物理的对数据库表中一列或多列的值进行排序的一种存储结构,它是某个表中一列或若干列值的集合和相应的指向表中物理标识这些值的数据页的逻辑指针清单。”。
试想一下,在图书馆中找寻一本图书,是从第一个书架开始,一直找到所需要为止方便快捷;还是通过索引直接定位到指定的书架以及具体的位置方便快捷,这当然不言而喻。所以,在关系型数据库技术中,索引的目的很明显,就是为了查找的速度快。但需要注意的是,不一定使用索引就必定快。索引的创建、使用合理与否,决定着索引的效率。
在数据库中,一般来说,把数据库索引分成聚簇索引和非聚簇索引两大类,其它什么唯一索引,主键索引等等,都是一些特定的叫法。这个在后面的索引系列中进先分析,本篇先分析底层的相关代码。
索引虽然可以加快查找效率,但也是需要付出代价的,在计算机中无外乎两种代价,空间代价和时间代价。索引本身需要存储空间,所以空间代价一定会增大;而在数据库的数据增删改时,除了修改数据本身还要修改索引,所以时间代价也是需要付出的。
在MySql中,不同的存储引擎使用的不同的索引数据结构,在MyISAM(非聚族索引)和InnoDB(聚簇索引)中使用的是B+树,而在内存引擎Memory中使用是HASH。下面就重点介绍一下这两个数据结构。
这先来看一下在mysql中的索引的底层数据结构:
先看一下HASH的数据结构(storage\heap中,内存型数据库引擎):
//heapdef.h
struct HASH_INFO {
HASH_INFO * next_key;
uchar * ptr_to_rec;
ulong hash; /* Cached key hash value. */
};
//heap.h
struct HP_KEYDEF /* Key definition with open */
{
uint flag{0}; /* HA_NOSAME | HA_NULL_PART_KEY */
uint keysegs{0}; /* Number of key-segment */
uint length{0}; /* Length of key (automatic) */
uint8 algorithm{0}; /* HASH / BTREE */
HA_KEYSEG *seg{nullptr};
HP_BLOCK block; /* Where keys are saved */
/*
Number of buckets used in hash table. Used only to provide
#records estimates for heap key scans.
*/
ha_rows hash_buckets{0};
TREE rb_tree;
int (*write_key)(HP_INFO *info, HP_KEYDEF *keyinfo, const uchar *record,
uchar *recpos){nullptr};
int (*delete_key)(HP_INFO *info, HP_KEYDEF *keyinfo, const uchar *record,
uchar *recpos, int flag){nullptr};
uint (*get_key_length)(HP_KEYDEF *keydef, const uchar *key){nullptr};
};
//include/my_comare.h
struct HA_KEYSEG /* Key-portion */
{
const CHARSET_INFO *charset;
uint32 start; /* Start of key in record */
uint32 null_pos; /* position to NULL indicator */
uint16 bit_pos; /* Position to bit part */
uint16 flag;
uint16 length; /* Keylength */
uint16 language;
uint8 type; /* Type of key (for sort) */
uint8 null_bit; /* bitmask to test for NULL */
uint8 bit_start, bit_end; /* if bit field */
uint8 bit_length; /* Length of bit part * /
};
这个需要注意的是不要和innodb中的自适应 Hash索引(Adaptive Hash Index)混淆,它们在不同的目录中。
再看一下树的数据结构:
//storage/innobase/include/mem0mem.h
/** The info structure stored at the beginning of a heap block */
struct mem_block_info_t {
uint64_t magic_n; /* magic number for debugging */
#ifdef UNIV_DEBUG
char file_name[16]; /* file name where the mem heap was created */
ulint line; /*!< line number where the mem heap was created */
#endif /* UNIV_DEBUG */
UT_LIST_BASE_NODE_T(mem_block_t)
base; /* In the first block in the
the list this is the base node of the list of blocks;
in subsequent blocks this is undefined */
UT_LIST_NODE_T(mem_block_t)
list; /* This contains pointers to next
and prev in the list. The first block allocated
to the heap is also the first block in this list,
though it also contains the base node of the list. */
ulint len; /*!< physical length of this block in bytes */
ulint total_size; /*!< physical length in bytes of all blocks
in the heap. This is defined only in the base
node and is set to ULINT_UNDEFINED in others. */
ulint type; /*!< type of heap: MEM_HEAP_DYNAMIC, or
MEM_HEAP_BUF possibly ORed to MEM_HEAP_BTR_SEARCH */
ulint free; /*!< offset in bytes of the first free position for
user data in the block */
ulint start; /*!< the value of the struct field 'free' at the
creation of the block */
void *free_block;
/* if the MEM_HEAP_BTR_SEARCH bit is set in type,
and this is the heap root, this can contain an
allocated buffer frame, which can be appended as a
free block to the heap, if we need more space;
otherwise, this is NULL */
void *buf_block;
/* if this block has been allocated from the buffer
pool, this contains the buf_block_t handle;
otherwise, this is NULL */
};
/** The search info struct in an index */
struct btr_search_t {
ulint ref_count; /*!< Number of blocks in this index tree
that have search index built
i.e. block->index points to this index.
Protected by search latch except
when during initialization in
btr_search_info_create(). */
/** @{ The following fields are not protected by any latch.
Unfortunately, this means that they must be aligned to
the machine word, i.e., they cannot be turned into bit-fields. */
buf_block_t *root_guess; /*!< the root page frame when it was last time
fetched, or NULL */
ulint hash_analysis; /*!< when this exceeds
BTR_SEARCH_HASH_ANALYSIS, the hash
analysis starts; this is reset if no
success noticed */
ibool last_hash_succ; /*!< TRUE if the last search would have
succeeded, or did succeed, using the hash
index; NOTE that the value here is not exact:
it is not calculated for every search, and the
calculation itself is not always accurate! */
ulint n_hash_potential;
/*!< number of consecutive searches
which would have succeeded, or did succeed,
using the hash index;
the range is 0 .. BTR_SEARCH_BUILD_LIMIT + 5 */
/** @} */
/**---------------------- @{ */
ulint n_fields; /*!< recommended prefix length for hash search:
number of full fields */
ulint n_bytes; /*!< recommended prefix: number of bytes in
an incomplete field
@see BTR_PAGE_MAX_REC_SIZE */
ibool left_side; /*!< TRUE or FALSE, depending on whether
the leftmost record of several records with
the same prefix should be indexed in the
hash index */
/*---------------------- @} */
#ifdef UNIV_SEARCH_PERF_STAT
ulint n_hash_succ; /*!< number of successful hash searches thus
far */
ulint n_hash_fail; /*!< number of failed hash searches */
ulint n_patt_succ; /*!< number of successful pattern searches thus
far */
ulint n_searches; /*!< number of searches */
#endif /* UNIV_SEARCH_PERF_STAT */
#ifdef UNIV_DEBUG
ulint magic_n; /*!< magic number @see BTR_SEARCH_MAGIC_N */
/** value of btr_search_t::magic_n, used in assertions */
#define BTR_SEARCH_MAGIC_N 1112765
#endif /* UNIV_DEBUG */
};
//storage/innobase/include/dict0mem.h
/** Data structure for an index. Most fields will be
initialized to 0, NULL or FALSE in dict_mem_index_create(). */
struct dict_index_t {
space_index_t id; /*!< id of the index */
mem_heap_t *heap; /*!< memory heap */
id_name_t name; /*!< index name */
const char *table_name; /*!< table name */
dict_table_t *table; /*!< back pointer to table */
unsigned space : 32;
/*!< space where the index tree is placed */
unsigned page : 32; /*!< index tree root page number */
unsigned merge_threshold : 6;
/*!< In the pessimistic delete, if the page
data size drops below this limit in percent,
merging it to a neighbor is tried */
#define DICT_INDEX_MERGE_THRESHOLD_DEFAULT 50
unsigned type : DICT_IT_BITS;
/*!< index type (DICT_CLUSTERED, DICT_UNIQUE,
DICT_IBUF, DICT_CORRUPT) */
#define MAX_KEY_LENGTH_BITS 12
unsigned trx_id_offset : MAX_KEY_LENGTH_BITS;
/*!< position of the trx id column
in a clustered index record, if the fields
before it are known to be of a fixed size,
0 otherwise */
#if (1 << MAX_KEY_LENGTH_BITS) < MAX_KEY_LENGTH
#error(1<<MAX_KEY_LENGTH_BITS) < MAX_KEY_LENGTH
#endif
unsigned n_user_defined_cols : 10;
/*!< number of columns the user defined to
be in the index: in the internal
representation we add more columns */
unsigned allow_duplicates : 1;
/*!< if true, allow duplicate values
even if index is created with unique
constraint */
unsigned nulls_equal : 1;
/*!< if true, SQL NULL == SQL NULL */
unsigned disable_ahi : 1;
/*!< if true, then disable AHI. Currently
limited to intrinsic temporary table and SDI
table as index id is not unique for such table
which is one of the validation criterion for
ahi. */
unsigned n_uniq : 10; /*!< number of fields from the beginning
which are enough to determine an index
entry uniquely */
unsigned n_def : 10; /*!< number of fields defined so far */
unsigned n_fields : 10; /*!< number of fields in the index */
unsigned n_nullable : 10; /*!< number of nullable fields */
unsigned n_instant_nullable : 10;
/*!< number of nullable fields before first
instant ADD COLUMN applied to this table.
This is valid only when has_instant_cols() is true */
unsigned cached : 1; /*!< TRUE if the index object is in the
dictionary cache */
unsigned to_be_dropped : 1;
/*!< TRUE if the index is to be dropped;
protected by dict_operation_lock */
unsigned online_status : 2;
/*!< enum online_index_status.
Transitions from ONLINE_INDEX_COMPLETE (to
ONLINE_INDEX_CREATION) are protected
by dict_operation_lock and
dict_sys->mutex. Other changes are
protected by index->lock. */
unsigned uncommitted : 1;
/*!< a flag that is set for secondary indexes
that have not been committed to the
data dictionary yet */
unsigned instant_cols : 1;
/*!< TRUE if the index is clustered index and it has some
instant columns */
uint32_t srid; /* spatial reference id */
bool srid_is_valid;
/* says whether SRID is valid - it cane be
undefined */
std::unique_ptr<dd::Spatial_reference_system> rtr_srs;
/*!< Cached spatial reference system dictionary
entry used by R-tree indexes. */
#ifdef UNIV_DEBUG
uint32_t magic_n; /*!< magic number */
/** Value of dict_index_t::magic_n */
#define DICT_INDEX_MAGIC_N 76789786
#endif
dict_field_t *fields; /*!< array of field descriptions */
#ifndef UNIV_HOTBACKUP
st_mysql_ftparser *parser; /*!< fulltext parser plugin */
bool is_ngram;
/*!< true if it's ngram parser */
bool has_new_v_col;
/*!< whether it has a newly added virtual
column in ALTER */
bool hidden; /*!< if the index is an hidden index */
#endif /* !UNIV_HOTBACKUP */
UT_LIST_NODE_T(dict_index_t)
indexes; /*!< list of indexes of the table */
btr_search_t *search_info;
/*!< info used in optimistic searches */
#ifndef UNIV_HOTBACKUP
row_log_t *online_log;
/*!< the log of modifications
during online index creation;
valid when online_status is
ONLINE_INDEX_CREATION */
/*----------------------*/
/** Statistics for query optimization */
/** @{ */
ib_uint64_t *stat_n_diff_key_vals;
/*!< approximate number of different
key values for this index, for each
n-column prefix where 1 <= n <=
dict_get_n_unique(index) (the array is
indexed from 0 to n_uniq-1); we
periodically calculate new
estimates */
ib_uint64_t *stat_n_sample_sizes;
/*!< number of pages that were sampled
to calculate each of stat_n_diff_key_vals[],
e.g. stat_n_sample_sizes[3] pages were sampled
to get the number stat_n_diff_key_vals[3]. */
ib_uint64_t *stat_n_non_null_key_vals;
/* approximate number of non-null key values
for this index, for each column where
1 <= n <= dict_get_n_unique(index) (the array
is indexed from 0 to n_uniq-1); This
is used when innodb_stats_method is
"nulls_ignored". */
ulint stat_index_size;
/*!< approximate index size in
database pages */
#endif /* !UNIV_HOTBACKUP */
ulint stat_n_leaf_pages;
/*!< approximate number of leaf pages in the
index tree */
/** @} */
last_ops_cur_t *last_ins_cur;
/*!< cache the last insert position.
Currently limited to auto-generated
clustered index on intrinsic table only. */
last_ops_cur_t *last_sel_cur;
/*!< cache the last selected position
Currently limited to intrinsic table only. */
rec_cache_t rec_cache;
/*!< cache the field that needs to be
re-computed on each insert.
Limited to intrinsic table as this is common
share and can't be used without protection
if table is accessible to multiple-threads. */
rtr_ssn_t rtr_ssn; /*!< Node sequence number for RTree */
rtr_info_track_t *rtr_track; /*!< tracking all R-Tree search cursors */
trx_id_t trx_id; /*!< id of the transaction that created this
index, or 0 if the index existed
when InnoDB was started up */
zip_pad_info_t zip_pad; /*!< Information about state of
compression failures and successes */
rw_lock_t lock; /*!< read-write lock protecting the
upper levels of the index tree */
bool fill_dd; /*!< Flag whether need to fill dd tables
when it's a fulltext index. */
/** Determine if the index has been committed to the
data dictionary.
@return whether the index definition has been committed */
bool is_committed() const {
ut_ad(!uncommitted || !(type & DICT_CLUSTERED));
return (UNIV_LIKELY(!uncommitted));
}
/** Flag an index committed or uncommitted.
@param[in] committed whether the index is committed */
void set_committed(bool committed) {
ut_ad(!to_be_dropped);
ut_ad(committed || !(type & DICT_CLUSTERED));
uncommitted = !committed;
}
/** Get the next index.
@return next index
@retval NULL if this was the last index */
const dict_index_t *next() const {
const dict_index_t *next = UT_LIST_GET_NEXT(indexes, this);
ut_ad(magic_n == DICT_INDEX_MAGIC_N);
return (next);
}
/** Get the next index.
@return next index
@retval NULL if this was the last index */
dict_index_t *next() {
return (const_cast<dict_index_t *>(
const_cast<const dict_index_t *>(this)->next()));
}
/** Check whether the index is corrupted.
@return true if index is corrupted, otherwise false */
bool is_corrupted() const {
ut_ad(magic_n == DICT_INDEX_MAGIC_N);
return (type & DICT_CORRUPT);
}
/* Check whether the index is the clustered index
@return nonzero for clustered index, zero for other indexes */
bool is_clustered() const {
ut_ad(magic_n == DICT_INDEX_MAGIC_N);
return (type & DICT_CLUSTERED);
}
/** Check whether the index is the multi-value index
@return nonzero for multi-value index, zero for other indexes */
bool is_multi_value() const {
ut_ad(magic_n == DICT_INDEX_MAGIC_N);
return (type & DICT_MULTI_VALUE);
}
/** Returns the minimum data size of an index record.
@return minimum data size in bytes */
ulint get_min_size() const {
ulint size = 0;
for (unsigned i = 0; i < n_fields; i++) {
size += get_col(i)->get_min_size();
}
return (size);
}
/** Check whether index can be used by transaction
@param[in] trx transaction*/
bool is_usable(const trx_t *trx) const;
/** Check whether index has any instantly added columns
@return true if this is instant affected, otherwise false */
bool has_instant_cols() const { return (instant_cols); }
/** Check if tuple is having instant format.
@param[in] n_fields_in_tuple number of fields in tuple
@return true if yes, false otherwise. */
bool is_tuple_instant_format(const uint16_t n_fields_in_tuple) const;
/** Returns the number of nullable fields before specified
nth field
@param[in] nth nth field to check */
uint32_t get_n_nullable_before(uint32_t nth) const {
uint32_t nullable = n_nullable;
ut_ad(nth <= n_fields);
for (uint32_t i = nth; i < n_fields; ++i) {
if (get_field(i)->col->is_nullable()) {
--nullable;
}
}
return (nullable);
}
/** Returns the number of fields before first instant ADD COLUMN */
uint32_t get_instant_fields() const;
/** Adds a field definition to an index. NOTE: does not take a copy
of the column name if the field is a column. The memory occupied
by the column name may be released only after publishing the index.
@param[in] name_arg column name
@param[in] prefix_len 0 or the column prefix length in a MySQL index
like INDEX (textcol(25))
@param[in] is_ascending true=ASC, false=DESC */
void add_field(const char *name_arg, ulint prefix_len, bool is_ascending) {
dict_field_t *field;
ut_ad(magic_n == DICT_INDEX_MAGIC_N);
n_def++;
field = get_field(n_def - 1);
field->name = name_arg;
field->prefix_len = (unsigned int)prefix_len;
field->is_ascending = is_ascending;
}
/** Gets the nth field of an index.
@param[in] pos position of field
@return pointer to field object */
dict_field_t *get_field(ulint pos) const {
ut_ad(pos < n_def);
ut_ad(magic_n == DICT_INDEX_MAGIC_N);
return (fields + pos);
}
/** Gets pointer to the nth column in an index.
@param[in] pos position of the field
@return column */
const dict_col_t *get_col(ulint pos) const { return (get_field(pos)->col); }
/** Gets the column number the nth field in an index.
@param[in] pos position of the field
@return column number */
ulint get_col_no(ulint pos) const;
/** Returns the position of a system column in an index.
@param[in] type DATA_ROW_ID, ...
@return position, ULINT_UNDEFINED if not contained */
ulint get_sys_col_pos(ulint type) const;
/** Looks for column n in an index.
@param[in] n column number
@param[in] inc_prefix true=consider column prefixes too
@param[in] is_virtual true==virtual column
@return position in internal representation of the index;
ULINT_UNDEFINED if not contained */
ulint get_col_pos(ulint n, bool inc_prefix = false,
bool is_virtual = false) const;
/** Get the default value of nth field and its length if exists.
If not exists, both the return value is nullptr and length is 0.
@param[in] nth nth field to get
@param[in,out] length length of the default value
@return the default value data of nth field */
const byte *get_nth_default(ulint nth, ulint *length) const {
ut_ad(nth < n_fields);
ut_ad(get_instant_fields() <= nth);
const dict_col_t *col = get_col(nth);
if (col->instant_default == nullptr) {
*length = 0;
return (nullptr);
}
*length = col->instant_default->len;
ut_ad(*length == 0 || *length == UNIV_SQL_NULL ||
col->instant_default->value != nullptr);
return (col->instant_default->value);
}
/** Sets srid and srid_is_valid values
@param[in] srid_value value of SRID, may be garbage
if srid_is_valid_value = false
@param[in] srid_is_valid_value value of srid_is_valid */
void fill_srid_value(uint32_t srid_value, bool srid_is_valid_value) {
srid_is_valid = srid_is_valid_value;
srid = srid_value;
}
/** Check if the underlying table is compressed.
@return true if compressed, false otherwise. */
bool is_compressed() const;
/** Check if a multi-value index is built on specified multi-value
virtual column. Please note that there could be only one multi-value
virtual column on the multi-value index, but not necessary the first
field of the index.
@param[in] mv_col multi-value virtual column
@return non-zero means the column is on the index and this is the
nth position of the column, zero means it's not on the index */
uint32_t has_multi_value_col(const dict_v_col_t *mv_col) const {
ut_ad(is_multi_value());
for (uint32_t i = 0; i < n_fields; ++i) {
const dict_col_t *col = get_col(i);
if (mv_col->m_col.ind == col->ind) {
return (i + 1);
}
/* Only one multi-value field, if not match then no match. */
if (col->is_multi_value()) {
break;
}
}
return (0);
}
public:
/** Get the page size of the tablespace to which this index belongs.
@return the page size. */
page_size_t get_page_size() const;
/** Get the space id of the tablespace to which this index belongs.
@return the space id. * /
space_id_t space_id() const { return space; }
};
最后一个数据结构dict_index_t的注释明确说明了这就是索引结构的数据结构体,在innodb中采用的是聚集索引,聚集索引在前面提到过,索引数据和数据数据存储在同一物理空间内。
提到B+树,其实还有B树和B*树,在学习数据结构的时候儿还有AVL树和红黑树,这些在DB技术里都有应用,有兴趣的可以对比学习分析一下。其实这些树的数据结构,只要掌握了任何的其中一种,再学习其它树,只要明白它们不同和优缺点就非常容易了。
哈希部分比较简单,这里只分析B+树的索引部分:
/** Creates an index memory object.
@return own: index object */
dict_index_t *dict_mem_index_create(
const char *table_name, /*!< in: table name */
const char *index_name, /*!< in: index name */
ulint space, /*!< in: space where the index tree is
placed, ignored if the index is of
the clustered type */
ulint type, /*!< in: DICT_UNIQUE,
DICT_CLUSTERED, ... ORed */
ulint n_fields) /*!< in: number of fields */
{
dict_index_t *index;
mem_heap_t *heap;
ut_ad(table_name && index_name);
heap = mem_heap_create(DICT_HEAP_SIZE);
index = static_cast<dict_index_t *>(mem_heap_zalloc(heap, sizeof(*index)));
dict_mem_fill_index_struct(index, heap, table_name, index_name, space, type,
n_fields);
#ifndef UNIV_HOTBACKUP
#ifndef UNIV_LIBRARY
dict_index_zip_pad_mutex_create_lazy(index);
if (type & DICT_SPATIAL) {
mutex_create(LATCH_ID_RTR_SSN_MUTEX, &index->rtr_ssn.mutex);
index->rtr_track = static_cast<rtr_info_track_t *>(
mem_heap_alloc(heap, sizeof(*index->rtr_track)));
mutex_create(LATCH_ID_RTR_ACTIVE_MUTEX,
&index->rtr_track->rtr_active_mutex);
index->rtr_track->rtr_active = UT_NEW_NOKEY(rtr_info_active());
}
#endif /* !UNIV_LIBRARY */
#endif /* !UNIV_HOTBACKUP */
return (index);
}
/** This function poplulates a dict_index_t index memory structure with
supplied information. */
UNIV_INLINE
void dict_mem_fill_index_struct(
dict_index_t *index, /*!< out: index to be filled */
mem_heap_t *heap, /*!< in: memory heap */
const char *table_name, /*!< in: table name */
const char *index_name, /*!< in: index name */
ulint space, /*!< in: space where the index tree is
placed, ignored if the index is of
the clustered type */
ulint type, /*!< in: DICT_UNIQUE,
DICT_CLUSTERED, ... ORed */
ulint n_fields) /*!< in: number of fields */
{
if (heap) {
index->heap = heap;
index->name = mem_heap_strdup(heap, index_name);
index->fields = (dict_field_t *)mem_heap_alloc(
heap, 1 + n_fields * sizeof(dict_field_t));
} else {
index->name = index_name;
index->heap = nullptr;
index->fields = nullptr;
}
/* Assign a ulint to a 4-bit-mapped field.
Only the low-order 4 bits are assigned. */
index->type = type;
#ifndef UNIV_HOTBACKUP
index->space = (unsigned int)space;
index->page = FIL_NULL;
index->merge_threshold = DICT_INDEX_MERGE_THRESHOLD_DEFAULT;
#endif /* !UNIV_HOTBACKUP */
index->table_name = table_name;
index->n_fields = (unsigned int)n_fields;
/* The '1 +' above prevents allocation
of an empty mem block */
index->allow_duplicates = false;
index->nulls_equal = false;
index->disable_ahi = false;
index->last_ins_cur = nullptr;
index->last_sel_cur = nullptr;
#ifndef UNIV_HOTBACKUP
new (&index->rec_cache) rec_cache_t();
#endif /* UNIV_HOTBACKUP */
#ifdef UNIV_DEBUG
index->magic_n = DICT_INDEX_MAGIC_N;
#endif /* UNIV_DEBUG */
}
/** Returns the number of fields before first instant ADD COLUMN * /
inline uint32_t dict_index_t::get_instant_fields() const {
ut_ad(has_instant_cols());
return (n_fields - (table->n_cols - table->n_instant_cols));
}
上面是创建索引和显示索引的信息,再看一下如何给一列增加索引:
/** Adds a column to index.
@param[in,out] index index
@param[in] table table
@param[in] col column
@param[in] prefix_len column prefix length
@param[in] is_ascending true=ASC, false=DESC */
void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
dict_col_t *col, ulint prefix_len, bool is_ascending) {
dict_field_t *field;
const char *col_name;
#ifndef UNIV_LIBRARY
if (col->is_virtual()) {
#ifndef UNIV_HOTBACKUP
dict_v_col_t *v_col = reinterpret_cast<dict_v_col_t *>(col);
/* When v_col->v_indexes==NULL,
ha_innobase::commit_inplace_alter_table(commit=true)
will evict and reload the table definition, and
v_col->v_indexes will not be NULL for the new table. */
if (v_col->v_indexes != nullptr) {
/* Register the index with the virtual column index
list */
struct dict_v_idx_t new_idx = {index, index->n_def};
v_col->v_indexes->push_back(new_idx);
}
col_name = dict_table_get_v_col_name_mysql(table, dict_col_get_no(col));
#else /* !UNIV_HOTBACKUP */
/* PRELIMINARY TEMPORARY WORKAROUND: is this ever used? */
bool not_hotbackup = false;
ut_a(not_hotbackup);
#endif /* !UNIV_HOTBACKUP */
} else
#endif /* !UNIV_LIBRARY */
{
col_name = table->get_col_name(dict_col_get_no(col));
}
index->add_field(col_name, prefix_len, is_ascending);
field = index->get_field(index->n_def - 1);
field->col = col;
/* DATA_POINT is a special type, whose fixed_len should be:
1) DATA_MBR_LEN, when it's indexed in R-TREE. In this case,
it must be the first col to be added.
2) DATA_POINT_LEN(be equal to fixed size of column), when it's
indexed in B-TREE,
3) DATA_POINT_LEN, if a POINT col is the PRIMARY KEY, and we are
adding the PK col to other B-TREE/R-TREE. */
/* TODO: We suppose the dimension is 2 now. */
if (dict_index_is_spatial(index) && DATA_POINT_MTYPE(col->mtype) &&
index->n_def == 1) {
field->fixed_len = DATA_MBR_LEN;
} else {
field->fixed_len = static_cast<unsigned int>(
col->get_fixed_size(dict_table_is_comp(table)));
}
if (prefix_len && field->fixed_len > prefix_len) {
field->fixed_len = (unsigned int)prefix_len;
}
/* Long fixed-length fields that need external storage are treated as
variable-length fields, so that the extern flag can be embedded in
the length word. */
if (field->fixed_len > DICT_MAX_FIXED_COL_LEN) {
field->fixed_len = 0;
}
#if DICT_MAX_FIXED_COL_LEN != 768
/* The comparison limit above must be constant. If it were
changed, the disk format of some fixed-length columns would
change, which would be a disaster. * /
#error "DICT_MAX_FIXED_COL_LEN != 768"
#endif
if (!(col->prtype & DATA_NOT_NULL)) {
index->n_nullable++;
}
}
加载一个索引集:
/** Loads definitions for table indexes. Adds them to the data dictionary
cache.
@return DB_SUCCESS if ok, DB_CORRUPTION if corruption of dictionary
table or DB_UNSUPPORTED if table has unknown index type */
static dberr_t dict_load_indexes(
dict_table_t *table, /*!< in/out: table */
mem_heap_t *heap, /*!< in: memory heap for temporary storage */
dict_err_ignore_t ignore_err)
/*!< in: error to be ignored when
loading the index definition */
{
dict_table_t *sys_indexes;
dict_index_t *sys_index;
btr_pcur_t pcur;
dtuple_t *tuple;
dfield_t *dfield;
const rec_t *rec;
byte *buf;
mtr_t mtr;
dberr_t error = DB_SUCCESS;
ut_ad(mutex_own(&dict_sys->mutex));
mtr_start(&mtr);
sys_indexes = dict_table_get_low("SYS_INDEXES");
sys_index = UT_LIST_GET_FIRST(sys_indexes->indexes);
ut_ad(!dict_table_is_comp(sys_indexes));
ut_ad(name_of_col_is(sys_indexes, sys_index, DICT_FLD__SYS_INDEXES__NAME,
"NAME"));
ut_ad(name_of_col_is(sys_indexes, sys_index, DICT_FLD__SYS_INDEXES__PAGE_NO,
"PAGE_NO"));
tuple = dtuple_create(heap, 1);
dfield = dtuple_get_nth_field(tuple, 0);
buf = static_cast<byte *>(mem_heap_alloc(heap, 8));
mach_write_to_8(buf, table->id);
dfield_set_data(dfield, buf, 8);
dict_index_copy_types(tuple, sys_index, 1);
btr_pcur_open_on_user_rec(sys_index, tuple, PAGE_CUR_GE, BTR_SEARCH_LEAF,
&pcur, &mtr);
for (;;) {
dict_index_t *index = nullptr;
const char *err_msg;
if (!btr_pcur_is_on_user_rec(&pcur)) {
/* We should allow the table to open even
without index when DICT_ERR_IGNORE_CORRUPT is set.
DICT_ERR_IGNORE_CORRUPT is currently only set
for drop table */
if (table->first_index() == nullptr &&
!(ignore_err & DICT_ERR_IGNORE_CORRUPT)) {
ib::warn(ER_IB_MSG_197) << "Cannot load table " << table->name
<< " because it has no indexes in"
" InnoDB internal data dictionary.";
error = DB_CORRUPTION;
goto func_exit;
}
break;
}
rec = btr_pcur_get_rec(&pcur);
if ((ignore_err & DICT_ERR_IGNORE_RECOVER_LOCK) &&
(rec_get_n_fields_old_raw(rec) == DICT_NUM_FIELDS__SYS_INDEXES
/* a record for older SYS_INDEXES table
(missing merge_threshold column) is acceptable. */
||
rec_get_n_fields_old_raw(rec) == DICT_NUM_FIELDS__SYS_INDEXES - 1)) {
const byte *field;
ulint len;
field = rec_get_nth_field_old(rec, DICT_FLD__SYS_INDEXES__NAME, &len);
if (len != UNIV_SQL_NULL &&
static_cast<char>(*field) ==
static_cast<char>(*TEMP_INDEX_PREFIX_STR)) {
/* Skip indexes whose name starts with
TEMP_INDEX_PREFIX, because they will
be dropped during crash recovery. */
goto next_rec;
}
}
err_msg =
dict_load_index_low(buf, table->name.m_name, heap, rec, TRUE, &index);
ut_ad((index == nullptr && err_msg != nullptr) ||
(index != nullptr && err_msg == nullptr));
if (err_msg == dict_load_index_id_err) {
/* TABLE_ID mismatch means that we have
run out of index definitions for the table. */
if (table->first_index() == nullptr &&
!(ignore_err & DICT_ERR_IGNORE_CORRUPT)) {
ib::warn(ER_IB_MSG_198)
<< "Failed to load the"
" clustered index for table "
<< table->name << " because of the following error: " << err_msg
<< "."
" Refusing to load the rest of the"
" indexes (if any) and the whole table"
" altogether.";
error = DB_CORRUPTION;
goto func_exit;
}
break;
} else if (err_msg == dict_load_index_del) {
/* Skip delete-marked records. */
goto next_rec;
} else if (err_msg) {
ib::error(ER_IB_MSG_199) << err_msg;
if (ignore_err & DICT_ERR_IGNORE_CORRUPT) {
goto next_rec;
}
error = DB_CORRUPTION;
goto func_exit;
}
ut_ad(index);
/* Check whether the index is corrupted */
if (index->is_corrupted()) {
ib::error(ER_IB_MSG_200) << "Index " << index->name << " of table "
<< table->name << " is corrupted";
if (!srv_load_corrupted && !(ignore_err & DICT_ERR_IGNORE_CORRUPT) &&
index->is_clustered()) {
dict_mem_index_free(index);
error = DB_INDEX_CORRUPT;
goto func_exit;
} else {
/* We will load the index if
1) srv_load_corrupted is TRUE
2) ignore_err is set with
DICT_ERR_IGNORE_CORRUPT
3) if the index corrupted is a secondary
index */
ib::info(ER_IB_MSG_201) << "Load corrupted index " << index->name
<< " of table " << table->name;
}
}
if (index->type & DICT_FTS && !dict_table_has_fts_index(table)) {
/* This should have been created by now. */
ut_a(table->fts != nullptr);
DICT_TF2_FLAG_SET(table, DICT_TF2_FTS);
}
/* We check for unsupported types first, so that the
subsequent checks are relevant for the supported types. */
if (index->type & ~(DICT_CLUSTERED | DICT_UNIQUE | DICT_CORRUPT | DICT_FTS |
DICT_SPATIAL | DICT_VIRTUAL)) {
ib::error(ER_IB_MSG_202) << "Unknown type " << index->type << " of index "
<< index->name << " of table " << table->name;
error = DB_UNSUPPORTED;
dict_mem_index_free(index);
goto func_exit;
} else if (!index->is_clustered() && nullptr == table->first_index()) {
ib::error(ER_IB_MSG_203)
<< "Trying to load index " << index->name << " for table "
<< table->name << ", but the first index is not clustered!";
dict_mem_index_free(index);
error = DB_CORRUPTION;
goto func_exit;
} else if (dict_is_old_sys_table(table->id) &&
(index->is_clustered() || ((table == dict_sys->sys_tables) &&
!strcmp("ID_IND", index->name)))) {
/* The index was created in memory already at booting
of the database server */
dict_mem_index_free(index);
} else {
dict_load_fields(index, heap);
mutex_exit(&dict_sys->mutex);
error = dict_index_add_to_cache(table, index, index->page, FALSE);
mutex_enter(&dict_sys->mutex);
/* The data dictionary tables should never contain
invalid index definitions. */
if (UNIV_UNLIKELY(error != DB_SUCCESS)) {
goto func_exit;
}
}
next_rec:
btr_pcur_move_to_next_user_rec(&pcur, &mtr);
}
ut_ad(table->fts_doc_id_index == nullptr);
if (table->fts != nullptr) {
table->fts_doc_id_index =
dict_table_get_index_on_name(table, FTS_DOC_ID_INDEX_NAME);
}
/* If the table contains FTS indexes, populate table->fts->indexes */
if (dict_table_has_fts_index(table)) {
ut_ad(table->fts_doc_id_index != nullptr);
/* table->fts->indexes should have been created. * /
ut_a(table->fts->indexes != nullptr);
dict_table_get_all_fts_indexes(table, table->fts->indexes);
}
func_exit:
btr_pcur_close(&pcur);
mtr_commit(&mtr);
return (error);
}
再看一看通过索引查询相关数据:
/** Gets the column number.
@return col->ind, table column position (starting from 0) */
UNIV_INLINE
ulint dict_col_get_no(const dict_col_t *col) /*!< in: column */
{
ut_ad(col);
return (col->ind);
}
/** Gets the column position in the clustered index. */
UNIV_INLINE
ulint dict_col_get_clust_pos(
const dict_col_t *col, /*!< in: table column */
const dict_index_t *clust_index) /*!< in: clustered index */
{
ulint i;
ut_ad(col);
ut_ad(clust_index);
ut_ad(clust_index->is_clustered());
for (i = 0; i < clust_index->n_def; i++) {
const dict_field_t *field = &clust_index->fields[i];
if (!field->prefix_len && field->col == col) {
return (i);
}
}
return (ULINT_UNDEFINED);
}
/** Gets the column position in the given index.
@param[in] col table column
@param[in] index index to be searched for column
@return position of column in the given index. */
UNIV_INLINE
ulint dict_col_get_index_pos(const dict_col_t *col, const dict_index_t *index) {
ulint i;
for (i = 0; i < index->n_def; i++) {
const dict_field_t *field = &index->fields[i];
if (!field->prefix_len && field->col == col) {
return (i);
}
}
return (ULINT_UNDEFINED);
}
/** Check whether the index consists of descending columns only.
@param[in] index index tree
@retval true if index has any descending column
@retval false if index has only ascending columns */
UNIV_INLINE
bool dict_index_has_desc(const dict_index_t *index) {
ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);
for (ulint i = 0; i < index->n_def; i++) {
const dict_field_t *field = &index->fields[i];
if (!field->is_ascending) {
return (true);
}
}
return (false);
}
/** Check if index is auto-generated clustered index.
@param[in] index index
@return true if index is auto-generated clustered index. */
UNIV_INLINE
bool dict_index_is_auto_gen_clust(const dict_index_t *index) {
return (index->type == DICT_CLUSTERED);
}
/** Check whether the index is unique.
@return nonzero for unique index, zero for other indexes */
UNIV_INLINE
ulint dict_index_is_unique(const dict_index_t *index) /*!< in: index */
{
ut_ad(index);
ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);
return (index->type & DICT_UNIQUE);
}
/** Check whether the index is a Spatial Index.
@return nonzero for Spatial Index, zero for other indexes */
UNIV_INLINE
ulint dict_index_is_spatial(const dict_index_t *index) /*!< in: index */
{
ut_ad(index);
ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);
return (index->type & DICT_SPATIAL);
}
/** Check whether the index contains a virtual column
@param[in] index index
@return nonzero for the index has virtual column, zero for other indexes */
UNIV_INLINE
ulint dict_index_has_virtual(const dict_index_t *index) {
ut_ad(index);
ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);
return (index->type & DICT_VIRTUAL);
}
/** Check whether the index is the insert buffer tree.
@return nonzero for insert buffer, zero for other indexes */
UNIV_INLINE
ulint dict_index_is_ibuf(const dict_index_t *index) /*!< in: index */
{
ut_ad(index);
ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);
return (index->type & DICT_IBUF);
}
/** Check whether the index is a secondary index or the insert buffer tree.
@return nonzero for insert buffer, zero for other indexes */
UNIV_INLINE
ulint dict_index_is_sec_or_ibuf(const dict_index_t *index) /*!< in: index */
{
ulint type;
ut_ad(index);
ut_ad(index->magic_n == DICT_INDEX_MAGIC_N);
type = index->type;
return (!(type & DICT_CLUSTERED) || (type & DICT_IBUF));
}
再看一下Page分裂时如何处理:
/** Splits an index page to halves and inserts the tuple. It is assumed
that mtr holds an x-latch to the index tree. NOTE: the tree x-latch is
released within this function! NOTE that the operation of this
function must always succeed, we cannot reverse it: therefore enough
free disk space (2 pages) must be guaranteed to be available before
this function is called.
@return inserted record */
rec_t *btr_page_split_and_insert(
uint32_t flags, /*!< in: undo logging and locking flags */
btr_cur_t *cursor, /*!< in: cursor at which to insert; when the
function returns, the cursor is positioned
on the predecessor of the inserted record */
ulint **offsets, /*!< out: offsets on inserted record */
mem_heap_t **heap, /*!< in/out: pointer to memory heap, or NULL */
const dtuple_t *tuple, /*!< in: tuple to insert */
mtr_t *mtr) /*!< in: mtr */
{
buf_block_t *block;
page_t *page;
page_zip_des_t *page_zip;
page_no_t page_no;
byte direction;
page_no_t hint_page_no;
buf_block_t *new_block;
page_t *new_page;
page_zip_des_t *new_page_zip;
rec_t *split_rec;
buf_block_t *left_block;
buf_block_t *right_block;
buf_block_t *insert_block;
page_cur_t *page_cursor;
rec_t *first_rec;
byte *buf = nullptr; /* remove warning */
rec_t *move_limit;
ibool insert_will_fit;
ibool insert_left;
ulint n_iterations = 0;
rec_t *rec;
ulint n_uniq;
dict_index_t *index;
index = btr_cur_get_index(cursor);
if (dict_index_is_spatial(index)) {
/* Split rtree page and update parent */
return (
rtr_page_split_and_insert(flags, cursor, offsets, heap, tuple, mtr));
}
if (!*heap) {
*heap = mem_heap_create(1024);
}
n_uniq = dict_index_get_n_unique_in_tree(cursor->index);
func_start:
ut_ad(tuple->m_heap != *heap);
mem_heap_empty(*heap);
*offsets = nullptr;
ut_ad(mtr_memo_contains_flagged(mtr, dict_index_get_lock(cursor->index),
MTR_MEMO_X_LOCK | MTR_MEMO_SX_LOCK) ||
cursor->index->table->is_intrinsic());
ut_ad(!dict_index_is_online_ddl(cursor->index) || (flags & BTR_CREATE_FLAG) ||
cursor->index->is_clustered());
ut_ad(rw_lock_own_flagged(dict_index_get_lock(cursor->index),
RW_LOCK_FLAG_X | RW_LOCK_FLAG_SX) ||
cursor->index->table->is_intrinsic());
block = btr_cur_get_block(cursor);
page = buf_block_get_frame(block);
page_zip = buf_block_get_page_zip(block);
ut_ad(
mtr_is_block_fix(mtr, block, MTR_MEMO_PAGE_X_FIX, cursor->index->table));
ut_ad(!page_is_empty(page));
/* try to insert to the next page if possible before split */
rec =
btr_insert_into_right_sibling(flags, cursor, offsets, *heap, tuple, mtr);
if (rec != nullptr) {
return (rec);
}
page_no = block->page.id.page_no();
/* 1. Decide the split record; split_rec == NULL means that the
tuple to be inserted should be the first record on the upper
half-page */
insert_left = FALSE;
if (n_iterations > 0) {
direction = FSP_UP;
hint_page_no = page_no + 1;
split_rec = btr_page_get_split_rec(cursor, tuple);
if (split_rec == nullptr) {
insert_left =
btr_page_tuple_smaller(cursor, tuple, offsets, n_uniq, heap);
}
} else if (btr_page_get_split_rec_to_right(cursor, &split_rec)) {
direction = FSP_UP;
hint_page_no = page_no + 1;
} else if (btr_page_get_split_rec_to_left(cursor, &split_rec)) {
direction = FSP_DOWN;
hint_page_no = page_no - 1;
ut_ad(split_rec);
} else {
direction = FSP_UP;
hint_page_no = page_no + 1;
/* If there is only one record in the index page, we
can't split the node in the middle by default. We need
to determine whether the new record will be inserted
to the left or right. */
if (page_get_n_recs(page) > 1) {
split_rec = page_get_middle_rec(page);
} else if (btr_page_tuple_smaller(cursor, tuple, offsets, n_uniq, heap)) {
split_rec = page_rec_get_next(page_get_infimum_rec(page));
} else {
split_rec = nullptr;
}
}
/* 2. Allocate a new page to the index */
new_block = btr_page_alloc(cursor->index, hint_page_no, direction,
btr_page_get_level(page, mtr), mtr, mtr);
/* New page could not be allocated */
if (!new_block) {
return nullptr;
}
new_page = buf_block_get_frame(new_block);
new_page_zip = buf_block_get_page_zip(new_block);
btr_page_create(new_block, new_page_zip, cursor->index,
btr_page_get_level(page, mtr), mtr);
/* 3. Calculate the first record on the upper half-page, and the
first record (move_limit) on original page which ends up on the
upper half */
if (split_rec) {
first_rec = move_limit = split_rec;
*offsets =
rec_get_offsets(split_rec, cursor->index, *offsets, n_uniq, heap);
insert_left = cmp_dtuple_rec(tuple, split_rec, cursor->index, *offsets) < 0;
if (!insert_left && new_page_zip && n_iterations > 0) {
/* If a compressed page has already been split,
avoid further splits by inserting the record
to an empty page. */
split_rec = nullptr;
goto insert_empty;
}
} else if (insert_left) {
ut_a(n_iterations > 0);
first_rec = page_rec_get_next(page_get_infimum_rec(page));
move_limit = page_rec_get_next(btr_cur_get_rec(cursor));
} else {
insert_empty:
ut_ad(!split_rec);
ut_ad(!insert_left);
buf =
UT_NEW_ARRAY_NOKEY(byte, rec_get_converted_size(cursor->index, tuple));
first_rec = rec_convert_dtuple_to_rec(buf, cursor->index, tuple);
move_limit = page_rec_get_next(btr_cur_get_rec(cursor));
}
/* 4. Do first the modifications in the tree structure */
btr_attach_half_pages(flags, cursor->index, block, first_rec, new_block,
direction, mtr);
/* If the split is made on the leaf level and the insert will fit
on the appropriate half-page, we may release the tree x-latch.
We can then move the records after releasing the tree latch,
thus reducing the tree latch contention. */
if (split_rec) {
insert_will_fit =
!new_page_zip &&
btr_page_insert_fits(cursor, split_rec, offsets, tuple, heap);
} else {
if (!insert_left) {
UT_DELETE_ARRAY(buf);
buf = nullptr;
}
insert_will_fit =
!new_page_zip &&
btr_page_insert_fits(cursor, nullptr, offsets, tuple, heap);
}
if (!srv_read_only_mode && !cursor->index->table->is_intrinsic() &&
insert_will_fit && page_is_leaf(page) &&
!dict_index_is_online_ddl(cursor->index)) {
mtr->memo_release(dict_index_get_lock(cursor->index),
MTR_MEMO_X_LOCK | MTR_MEMO_SX_LOCK);
/* NOTE: We cannot release root block latch here, because it
has segment header and already modified in most of cases.*/
}
/* 5. Move then the records to the new page */
if (direction == FSP_DOWN) {
/* fputs("Split left\n", stderr); */
if (false
#ifdef UNIV_ZIP_COPY
|| page_zip
#endif /* UNIV_ZIP_COPY */
|| !page_move_rec_list_start(new_block, block, move_limit,
cursor->index, mtr)) {
/* For some reason, compressing new_page failed,
even though it should contain fewer records than
the original page. Copy the page byte for byte
and then delete the records from both pages
as appropriate. Deleting will always succeed. */
ut_a(new_page_zip);
page_zip_copy_recs(new_page_zip, new_page, page_zip, page, cursor->index,
mtr);
page_delete_rec_list_end(move_limit - page + new_page, new_block,
cursor->index, ULINT_UNDEFINED, ULINT_UNDEFINED,
mtr);
/* Update the lock table and possible hash index. */
if (!dict_table_is_locking_disabled(cursor->index->table)) {
lock_move_rec_list_start(new_block, block, move_limit,
new_page + PAGE_NEW_INFIMUM);
}
btr_search_move_or_delete_hash_entries(new_block, block, cursor->index);
/* Delete the records from the source page. */
page_delete_rec_list_start(move_limit, block, cursor->index, mtr);
}
left_block = new_block;
right_block = block;
if (!dict_table_is_locking_disabled(cursor->index->table)) {
lock_update_split_left(right_block, left_block);
}
} else {
/* fputs("Split right\n", stderr); */
if (false
#ifdef UNIV_ZIP_COPY
|| page_zip
#endif /* UNIV_ZIP_COPY */
|| !page_move_rec_list_end(new_block, block, move_limit, cursor->index,
mtr)) {
/* For some reason, compressing new_page failed,
even though it should contain fewer records than
the original page. Copy the page byte for byte
and then delete the records from both pages
as appropriate. Deleting will always succeed. */
ut_a(new_page_zip);
page_zip_copy_recs(new_page_zip, new_page, page_zip, page, cursor->index,
mtr);
page_delete_rec_list_start(move_limit - page + new_page, new_block,
cursor->index, mtr);
/* Update the lock table and possible hash index. */
if (!dict_table_is_locking_disabled(cursor->index->table)) {
lock_move_rec_list_end(new_block, block, move_limit);
}
ut_ad(!dict_index_is_spatial(index));
btr_search_move_or_delete_hash_entries(new_block, block, cursor->index);
/* Delete the records from the source page. */
page_delete_rec_list_end(move_limit, block, cursor->index,
ULINT_UNDEFINED, ULINT_UNDEFINED, mtr);
}
left_block = block;
right_block = new_block;
if (!dict_table_is_locking_disabled(cursor->index->table)) {
lock_update_split_right(right_block, left_block);
}
}
#ifdef UNIV_ZIP_DEBUG
if (page_zip) {
ut_a(page_zip_validate(page_zip, page, cursor->index));
ut_a(page_zip_validate(new_page_zip, new_page, cursor->index));
}
#endif /* UNIV_ZIP_DEBUG */
/* At this point, split_rec, move_limit and first_rec may point
to garbage on the old page. */
/* 6. The split and the tree modification is now completed. Decide the
page where the tuple should be inserted */
if (insert_left) {
insert_block = left_block;
} else {
insert_block = right_block;
}
/* 7. Reposition the cursor for insert and try insertion */
page_cursor = btr_cur_get_page_cur(cursor);
page_cur_search(insert_block, cursor->index, tuple, page_cursor);
rec = page_cur_tuple_insert(page_cursor, tuple, cursor->index, offsets, heap,
mtr);
#ifdef UNIV_ZIP_DEBUG
{
page_t *insert_page = buf_block_get_frame(insert_block);
page_zip_des_t *insert_page_zip = buf_block_get_page_zip(insert_block);
ut_a(!insert_page_zip ||
page_zip_validate(insert_page_zip, insert_page, cursor->index));
}
#endif /* UNIV_ZIP_DEBUG */
if (rec != nullptr) {
goto func_exit;
}
/* 8. If insert did not fit, try page reorganization.
For compressed pages, page_cur_tuple_insert() will have
attempted this already. */
if (page_cur_get_page_zip(page_cursor) ||
!btr_page_reorganize(page_cursor, cursor->index, mtr)) {
goto insert_failed;
}
rec = page_cur_tuple_insert(page_cursor, tuple, cursor->index, offsets, heap,
mtr);
if (rec == nullptr) {
/* The insert did not fit on the page: loop back to the
start of the function for a new split */
insert_failed:
/* We play safe and reset the free bits for new_page */
if (!cursor->index->is_clustered() &&
!cursor->index->table->is_temporary()) {
ibuf_reset_free_bits(new_block);
ibuf_reset_free_bits(block);
}
n_iterations++;
ut_ad(n_iterations < 2 || buf_block_get_page_zip(insert_block));
ut_ad(!insert_will_fit);
goto func_start;
}
func_exit:
/* Insert fit on the page: update the free bits for the
left and right pages in the same mtr * /
if (!cursor->index->is_clustered() && !cursor->index->table->is_temporary() &&
page_is_leaf(page)) {
ibuf_update_free_bits_for_two_pages_low(left_block, right_block, mtr);
}
MONITOR_INC(MONITOR_INDEX_SPLIT);
ut_ad(page_validate(buf_block_get_frame(left_block), cursor->index));
ut_ad(page_validate(buf_block_get_frame(right_block), cursor->index));
ut_ad(!rec || rec_offs_validate(rec, cursor->index, * offsets));
基本上索引和数据在innodb中都在字典及页处理部分中,更多的细节可以查看相关的storage/include 以及相关的dict路径下的相关代码。换句话说,可以在include中的gis0tree.h 和gistree.ic中有对一些更细节的数据处理的函数,不过,在此引擎中,对树的描述相对分散一些,毕竟聚簇索引要保障在描述索引时还要描述数据,所以这点要看明白。
在MySql5.5以后,默认的数据库引擎是innodb而之间是MyISAM。而它的索引数据是分开来设计的,就比较好看清楚了。
说实话,已经有多年不正经写Sql语句了,更多的则是关注于数据底层的细节,特别是针对近些年来的一些NOSQL型数据库,花费了不少的精力。但回过头来看,整个数据库解决的不外乎两个问题:一个是存储量大;另外一个是CRUD快。涉及到具体的实现,就需要考虑各种安全、并行。更要考虑事务和一致性,还要考虑对分布式的支持等等。数据库技术是一门相当复杂的技术,从上到下,从理论到实践,都在互相不断促进着。
索引只是其中的重要的一环,要想用好索引,知道索引底层是如何实现的,能更好的有针对性的解决在实际中遇到的索引问题。不同的数据库,可能实现的机制略有不同,但是原理基本都是类似的。好好学习,天天向上。