Some people don’t probably know, but there is a difference between how indexes work in MyISAM and how they work in InnoDB, particularly when talking from the point of view of performance enhancement. Now since, InnoDB is starting to be widely used, it is important we understand how indexing works in InnoDB. Hence, the reason for this post!
The first and foremost thing to know is that InnoDB uses clustered index to store data in the table. Now what does clustered index mean?
A clustered index determines the physical order of data in a table. When thinking of a clustered index think of a telephone directory, where data is physically arranged by the last name. Because the clustered index decides the physical storage order of the data in the table, a table can only have a single clustered index. But, a clustered index can comprise of multiple columns (a composite index), in the same way as a telephone directory is organized both by the first name and the last name.
InnoDB stores indexes as B+tree data structures, and same is the case with the clustered index. But the difference is that in the case of clustered index InnoDB actually stores the index and the rows together in the same structure. When a table has a clustered index, its rows are actually stored in the index’s leaf pages. Thus InnoDB tables can also be called index-organized tables.
Now lets consider how InnoDB decides which index to use as the clustered index!
With InnoDB, typically PRIMARY KEY is synonymous with clustered index, but what if a PRIMARY KEY does not exist or there is not even a single index defined on the table. Then following is how InnoDB decides what to use as the clustered index:
Hence, my advice is that always define a PRIMARY KEY for each table that you create. If there is no logical key that can be created, add a new auto-increment column, and use it as the PRIMARY KEY.
In InnoDB, every SECONDARY INDEX contains the PRIMARY KEY column(s) together with the column(s) of the secondary index, automatically. That is because of the way InnoDB stores data, remember what I just told you when talking about how data is stored, a leaf node doesn’t store any pointer to the row’s physical location, but in fact stores the row’s data. So in other words the PRIMARY KEY is actually the pointer to the row data.
This makes us conclude on another interesting conclusion..
A secondary index requires two lookups! First a lookup for the secondary index itself, then a lookup for the primary key.
Clustering provided by InnoDB has very significant performance benefits, some of which are mentioned below:
These benefits that I have mentioned can boost performance drastically, if you design your tables and queries accordingly. But clustered indexes have disadvantages as well.
Following are some of the disadvantages of clustering:
Following is another thing that one should know regarding secondary indexes:
The records in InnoDB secondary are never updated in place. Therefore, what that means is that an UPDATE of a secondary index column means deleting the old record and inserting a new one.
Although, I did point out some disadvantages, but the fact is that these disadvantages can not be weighted down by the tremendous amount of benefits that comes with clustering in InnoDB. If you study and understand the aspects that I have mentioned in this article and apply them accordingly, you are going to see great performance enhancements. After all, clustering is another important step in bringing MySQL closer to MSSQL and Oracle.
Ref: http://www.ovaistariq.net/521/understanding-innodb-clustered-indexes/