Phoenix的索引介绍:http://phoenix.apache.org/secondary_indexing.html
全局索引适合重读轻写业务,官网介绍如下:
Global indexing targets read heavy uses cases. With global indexes, all the performance penalties for indexes occur at write time. We intercept the data table updates on write (DELETE, UPSERT VALUES and UPSERT SELECT), build the index update and then sent any necessary updates to all interested index tables. At read time, Phoenix will select the index table to use that will produce the fastest query time and directly scan it just like any other HBase table. By default, unless hinted, an index will not be used for a query that references a column that isn’t part of the index.
前期工作准备
CREATE TABLE test.test_salt (id INTEGERPRIMARY KEY, name VARCHAR, age INTEGER, address VARCHAR) SALT_BUCKETS = 20;
UPSERT INTO TEST.TEST_SALT VALUES(1, 'test01', 22, 'PEK');
UPSERT INTO TEST.TEST_SALT VALUES(2, 'test02', 18, 'SHA');
创建一个全局索引
CREATE INDEX testsalt_idx ON test.TEST_SALT(name, age);
通过索引查询数据
0: jdbc:phoenix:hadoop002:2181> select * from TEST.testsalt_idx;
+---------+--------+------+
| 0:NAME | 0:AGE | :ID |
+---------+--------+------+
| test02 | 18 | 2 |
| test01 | 22 | 1 |
+---------+--------+------+
2 rows selected (0.862 seconds)
查询结果column上的0表示默认字段会放在列族0上
hbase(main):001:0> desc 'TEST:TEST_SALT'
Table TEST:TEST_SALT is ENABLED
TEST:TEST_SALT, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocesso
r.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apa
che.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.c
lass=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'}
COLUMN FAMILIES DESCRIPTION
{NAME => '0', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESS
ION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.2980 seconds
删除一个全局索引
DROP INDEX testsalt_idx ON TEST.TEST_SALT ;
如何有效的使用索引,如下的查询都是FULL scan
0: jdbc:phoenix:hadoop002:2181> explain SELECT * FROM TEST.TEST_SALT
. . . . . . . . . . . . . . . > ;
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT | null | null | null |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.03 seconds)
0: jdbc:phoenix:hadoop002:2181> explain SELECT /* + INDEX(TEST.TEST_SALT TEST.TESTSALT_IDX) */ * FROM TEST.TEST_SALT ;
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT | null | null | null |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.025 seconds)
0: jdbc:phoenix:hadoop002:2181> explain SELECT ADDRESS FROM TEST.TEST_SALT WHERE AGE = 18;
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT | null | null | null |
| SERVER FILTER BY AGE = 18 | null | null | null |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
0: jdbc:phoenix:hadoop002:2181> explain SELECT name, ADDRESS FROM TEST.TEST_SALT WHERE AGE = 18;
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT | null | null | null |
| SERVER FILTER BY AGE = 18 | null | null | null |
+----------------------------------------------------------------------------+-----------------+----------------+--------------+
只有查询列完全被包含在索引中,才会走索引查询
0: jdbc:phoenix:hadoop002:2181> explain SELECT name FROM TEST.TEST_SALT WHERE AGE = 18;
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER TEST:TESTSALT_IDX [0] - [19] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18 | null | null | null |
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.037 seconds)
0: jdbc:phoenix:hadoop002:2181>
强制走索引
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_SALT TESTSALT_IDX) */ ADDRESS FROM TEST.TEST_SALT WHERE AGE = 18;
+---------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_SALT | null | null | null |
| SKIP-SCAN-JOIN TABLE 0 | null | null | null |
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER TEST:TESTSALT_IDX [0] - [19] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18 | null | null | null |
| DYNAMIC SERVER FILTER BY "TEST.TEST_SALT.ID" IN ($12.$14) | null | null | null |
+---------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.128 seconds)
总结:全局索引本质是一张表,在使用过程中,WHERE和SELECT字段要全部索引字段,或者WHERE字段要全部索引字段SELECT不全部要强制hint 使用索引。
本地索引适合重写轻读的业务,在官网的介绍:
Local indexing targets write heavy, space constrained use cases. Just like with global indexes, Phoenix will automatically select whether or not to use a local index at query-time. With local indexes, index data and table data co-reside on same server preventing any network overhead during writes. Local indexes can be used even when the query isn’t fully covered (i.e. Phoenix automatically retrieve the columns not in the index through point gets against the data table). Unlike global indexes, all local indexes of a table are stored in a single, separate shared table prior to 4.8.0 version. From 4.8.0 onwards we are storing all local index data in the separate shadow column families in the same data table. At read time when the local index is used, every region must be examined for the data as the exact region location of index data cannot be predetermined. Thus some overhead occurs at read-time.
准备数据
CREATE TABLE test.test_local (id INTEGER PRIMARY KEY, name VARCHAR, age INTEGER, address VARCHAR) split ON (1,2,3);
UPSERT INTO TEST.test_local VALUES(1, 'test01', 22, 'PEK');
UPSERT INTO TEST.test_local VALUES(2, 'test02', 18, 'SHA');
创建本地索引
CREATE LOCAL INDEX testlocal_idx ON test.TEST_LOCAL(name, age);
本地索引未建之前
hbase(main):002:0> desc 'TEST:TEST_LOCAL'
Table TEST:TEST_LOCAL is ENABLED
TEST:TEST_LOCAL, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocess
or.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.ap
ache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.
class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'}
COLUMN FAMILIES DESCRIPTION
{NAME => '0', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESS
ION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.2970 seconds
本地索引创建之后
hbase(main):003:0> desc 'TEST:TEST_LOCAL'
Table TEST:TEST_LOCAL is ENABLED
TEST:TEST_LOCAL, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocess
or.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.ap
ache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.
class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder', coprocessor$6 => '|org.apache.hadoop.hbase.regionserver.In
dexHalfStoreFileReaderGenerator|805306366|', METADATA => {'DATA_TABLE_NAME' => 'TEST.TEST_LOCAL', 'SPLIT_POLICY' => 'org.apache.phoenix.hbase.index.IndexRegionSplitPoli
cy'}}
COLUMN FAMILIES DESCRIPTION
{NAME => '0', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESS
ION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'L#0', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPR
ESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.0310 seconds
由此可以看出local index的设计方式是多了一个column family L#0,这就是本地索引与全局索引底层的区别。本地索引不是一个表,是所属表的一个列族,在hbase shell的list命令中看不到。索引数据直接写在原表rowkey中,列族不写任何实际信息。
hbase(main):005:0> scan 'TEST:TEST_LOCAL'
ROW COLUMN+CELL
\x80\x00\x00\x01 column=0:\x00\x00\x00\x00, timestamp=1585991153644, value=x
\x80\x00\x00\x01 column=0:\x80\x0B, timestamp=1585991153644, value=test01
\x80\x00\x00\x01 column=0:\x80\x0C, timestamp=1585991153644, value=\x80\x00\x00\x16
\x80\x00\x00\x01 column=0:\x80\x0D, timestamp=1585991153644, value=PEK
\x80\x00\x00\x01\x00\x00test01\x00\xC1\x17 column=L#0:\x00\x00\x00\x00, timestamp=1585991153644, value=\x00\x00\x00\x00
\x00\x80\x00\x00\x01
\x80\x00\x00\x02 column=0:\x00\x00\x00\x00, timestamp=1585991167298, value=x
\x80\x00\x00\x02 column=0:\x80\x0B, timestamp=1585991167298, value=test02
\x80\x00\x00\x02 column=0:\x80\x0C, timestamp=1585991167298, value=\x80\x00\x00\x12
\x80\x00\x00\x02 column=0:\x80\x0D, timestamp=1585991167298, value=SHA
\x80\x00\x00\x02\x00\x00test02\x00\xC1\x13 column=L#0:\x00\x00\x00\x00, timestamp=1585991167298, value=\x00\x00\x00\x00
\x00\x80\x00\x00\x02
4 row(s) in 0.0970 seconds
hbase(main):006:0>
通过查看表的数据来分析local index的rowkey的设计方位是(以上述的第二个本地索引为例):
\x80\x00\x00\x02 + \x00 + \x00test02\ + x00 + \xC1\x13 + \x00 + \x80\x00\x00\x02
原数据region的start key+"\x00"+二级索引字段1+"\x00"+二级索引字段2(复合索引)…+"\x00"+原rowkey
第一条信息"原数据region的start key",这样做的目的是保证索引数据和原数据在一个region上,定位到二级索引后根据原rowkey就可以很快在本region上获取到其它信息,减少网络开销和检索成本。
2.查询的时候,会在不同的region里面分别对二级索引字段定位,查到原rowkey后在本region上获取到其它信息
3.因为这种索引设计方式只写索引数据,省了不少空间的占用,根据索引信息拿到原rowkey后再根据rowkey到原数据里面获取其它信息。所以这种表的应用场景定位是重写轻读的业务。
通过查询计划看索引的使用
0: jdbc:phoenix:hadoop002:2181> explain SELECT * FROM TEST.TEST_LOCAL ;
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_LOCAL | null | null | null |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.039 seconds)
0: jdbc:phoenix:hadoop002:2181>
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_SALT TEST.TESTSALT_IDX) */ * FROM TEST.TEST_LOCAL ;
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_LOCAL | null | null | null |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.035 seconds)
0: jdbc:phoenix:hadoop002:2181>
0: jdbc:phoenix:hadoop002:2181> explain SELECT ADDRESS FROM TEST.TEST_LOCAL WHERE AGE = 18;
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_LOCAL [1] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18 | null | null | null |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.039 seconds)
0: jdbc:phoenix:hadoop002:2181>
0: jdbc:phoenix:hadoop002:2181> explain SELECT name, ADDRESS FROM TEST.TEST_LOCAL WHERE AGE = 18;
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_LOCAL [1] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18 | null | null | null |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.03 seconds)
0: jdbc:phoenix:hadoop002:2181>
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_SALT TESTSALT_IDX) */ ADDRESS FROM TEST.TEST_LOCAL WHERE AGE = 18;
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_LOCAL [1] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18 | null | null | null |
+--------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.033 seconds)
0: jdbc:phoenix:hadoop002:2181>
总结:要有where条件,且字段不要以*,实际上是没有index table,所以只要满足上述条件就会走索引。
原始列的数据也存在index table中,查询数据不需要扫描原始表,直接取结果,节省开销
准备数据
CREATE TABLE test.test_include (id INTEGER PRIMARY KEY, name VARCHAR, age INTEGER, address VARCHAR, sal integer) split ON (1,2,3);
UPSERT INTO TEST.test_include VALUES(1, 'test01', 22, 'PEK', 3300);
UPSERT INTO TEST.test_include VALUES(2, 'test02', 18, 'SHA', 3200);
创建索引:
CREATE LOCAL INDEX testinclude_idx ON test.test_include(name, age) include(address);
索引在查询时的使用
0: jdbc:phoenix:hadoop002:2181> explain SELECT * FROM TEST.TEST_INCLUDE ;
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_INCLUDE | null | null | null |
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.035 seconds)
0: jdbc:phoenix:hadoop002:2181>
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_INCLUDE TEST.TESTINCLUDE_IDX) */ SAL FROM TEST.TEST_INCLUDE ;
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST:TEST_INCLUDE | null | null | null |
+-----------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.032 seconds)
0: jdbc:phoenix:hadoop002:2181> explain SELECT ADDRESS FROM TEST.TEST_INCLUDE WHERE AGE = 18;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_INCLUDE [1] | null | null | null |
| SERVER FILTER BY TO_INTEGER("AGE") = 18 | null | null | null |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.026 seconds)
0: jdbc:phoenix:hadoop002:2181>
0: jdbc:phoenix:hadoop002:2181> explain SELECT NAME, ADDRESS FROM TEST.TEST_INCLUDE WHERE AGE = 18;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_INCLUDE [1] | null | null | null |
| SERVER FILTER BY TO_INTEGER("AGE") = 18 | null | null | null |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.022 seconds)
0: jdbc:phoenix:hadoop002:2181>
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_INCLUDE TESTINCLUDE_IDX) */ ADDRESS FROM TEST.TEST_INCLUDE WHERE AGE = 18;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_INCLUDE [1] | null | null | null |
| SERVER FILTER BY TO_INTEGER("AGE") = 18 | null | null | null |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.032 seconds)
0: jdbc:phoenix:hadoop002:2181>
0: jdbc:phoenix:hadoop002:2181> explain SELECT /*+ INDEX(TEST.TEST_INCLUDE TESTINCLUDE_IDX) */ SAL FROM TEST.TEST_INCLUDE WHERE AGE = 18;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER TEST:TEST_INCLUDE [1] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18 | null | null | null |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.038 seconds)
0: jdbc:phoenix:hadoop002:2181>
unctional indexes (available in 4.3 and above) allow you to create an index not just on columns, but on an arbitrary expressions. Then when a query uses that expression, the index may be used to retrieve the results instead of the data table. For example, you could create an index on UPPER(FIRST_NAME||‘ ’||LAST_NAME) to allow you to do case insensitive searches on the combined first name and last name of a person.
For example, the following would create this functional index:
CREATE INDEX UPPER_NAME_IDX ON EMP (UPPER(FIRST_NAME||' '||LAST_NAME))
With this index in place, when the following query is issued, the index would be used instead of the data table to retrieve the results:
SELECT EMP_ID FROM EMP WHERE UPPER(FIRST_NAME||' '||LAST_NAME)='JOHN DOE'
Phoenix supports two types of indexing techniques: global and local indexing. Each are useful in different scenarios and have their own failure profiles and performance characteristics.
For non transactional mutable tables, we maintain index update durability by adding the index updates to the Write-Ahead-Log (WAL) entry of the primary table row. Only after the WAL entry is successfully synced to disk do we attempt to make the index/primary table updates. We write the index updates in parallel by default, leading to very high throughput. If the server crashes while we are writing the index updates, we replay the all the index updates to the index tables in the WAL recovery process and rely on the idempotence of the updates to ensure correctness. Therefore, indexes on non transactional mutable tables are only ever a single batch of edits behind the primary table.
It’s important to note several points:
For a table in which the data is only written once and never updated in-place, certain optimizations may be made to reduce the write-time overhead for incremental maintenance. This is common with time-series data such as log or event data, where once a row is written, it will never be updated. To take advantage of these optimizations, declare your table as immutable by adding the IMMUTABLE_ROWS=true property to your DDL statement:
CREATE TABLE my_table (k VARCHAR PRIMARY KEY, v VARCHAR) IMMUTABLE_ROWS=true
All indexes on a table declared with IMMUTABLE_ROWS=true are considered immutable (note that by default, tables are considered mutable). For global immutable indexes, the index is maintained entirely on the client-side with the index table being generated as changes to the data table occur. Local immutable indexes, on the other hand, are maintained on the server-side. Note that no safeguards are in-place to enforce that a table declared as immutable doesn’t actually mutate data (as that would negate the performance gain achieved). If that was to occur, the index would no longer be in sync with the table.
If you have an existing table that you’d like to switch from immutable indexing to mutable indexing, use the ALTER TABLE command as show below:
ALTER TABLE my_table SET IMMUTABLE_ROWS=false
Index on non transactional, immutable tables have no mechanism in place to automatically deal with a commit failure. Maintaining consistency between the table and index is left to the client to handle. Because the updates are idempotent, the simplest solution is for the client to continue retrying the batch of mutations until they succeed.