数据湖(Data Lake)是时下热门的概念,更多阅读可以参考:
https://en.wikipedia.org/wiki/Data_lake
基于数据湖,可以不用做任何ETL、数据搬迁等过程,实现跨各种异构数据源进行大数据关联分析,从而极大的节省成本和提升用户体验。
以及AWS和Azure关于Data Lake的解读:
https://amazonaws-china.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
https://azure.microsoft.com/en-us/solutions/data-lake/
终于,阿里云现在也有了自己的数据湖分析产品:https://www.aliyun.com/product/datalakeanalytics
可以点击申请使用(目前公测阶段还属于邀测模式),体验本教程分析OTS数据之旅。
产品文档:https://help.aliyun.com/product/70174.html
关于Table Store的详细介绍,请看:https://help.aliyun.com/document_detail/27280.html
OTS概念 | DLA概念 |
---|---|
实例(instance) | schema或database,不同的用户不同的叫法 |
表(table) | table |
主键列(pk) | column,isPrimaryKey=true,isNullable=false |
非主键列(column) | column,isPrimaryKey=false,isNullable=<看用户的DDL定义> |
OTS | DLA |
---|---|
INTEGER(8bytes) | bigint(8bytes) |
STRING | varchar |
BINARY | varbinary |
DOUBLE | double |
BOOLEAN | byte |
下面,我们开始真正的操作:
下面,我就以我们的测试数据,来开启整个过程(跳过具体的申请步骤):
MySQL命令行:
mysql -h<您的DLA经典endpoint,在DLA的console上> -P10000 -u -p -c -A
JDBC URL:
jdbc:mysql://<您的DLA经典endpoint,在DLA的console上>:10000/
username=
password=
目前DLA和OTS服务之间,通过VPC相关的策略,是直接为用户打通网络环境的,用户无需担心这个过程。但DLA目前不支持公网访问,请__务必使用OTS的VPC Endpoint!__
注:我们是多租户场景的,所以新用户刚进去时看不到任何库表;
1)创建自己的DLA库(相关信息从上述过程中查找):
mysql> create database hangzhou_ots_test with dbproperties (
catalog = 'ots',
location = 'https://hz-tpch-1x-vol.cn-hangzhou.vpc.tablestore.aliyuncs.com',
instance = 'hz-tpch-1x-vol'
);
Query OK, 0 rows affected (0.23 sec)
#hangzhou_ots_test ---请注意库名,允许字母、数字、下划线
#catalog = 'ots', ---指定为ots,是为了区分其他数据源,比如oss、rds等
#location = 'https://xxx' ---ots的endpoint,从实例上可以看到
#instance = 'hz-tpch-1x-vol' ---指定instance名,因为endpoint可以不带实例名;最终映射到DLA的schema
2)查看自己创建的库:
mysql> show databases;
+------------------------------+
| Database |
+------------------------------+
| hangzhou_ots_test |
+------------------------------+
1 rows in set (0.22 sec)
mysql> show create database hangzhou_ots_test;
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Database | Create Database |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| hangzhou_ots_test | CREATE DATABASE `hangzhou_ots_test`
WITH DBPROPERTIES (
CATALOG = 'ots',
LOCATION = 'https://hz-tpch-1x-vol.cn-hangzhou.vpc.tablestore.aliyuncs.com',
INSTANCE = 'hz-tpch-1x-vol'
) |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.31 sec)
3)查看自己的DLA表:
mysql> use hangzhou_ots_test;
Database changed
mysql> show tables;
Empty set (0.30 sec)
4)创建DLA表,映射到OTS的表:
mysql> CREATE EXTERNAL TABLE `nation` (
`N_NATIONKEY` int not NULL ,
`N_COMMENT` varchar(100) NULL ,
`N_NAME` varchar(100) NULL ,
`N_REGIONKEY` int NULL ,
PRIMARY KEY (`N_NATIONKEY`)
);
Query OK, 0 rows affected (0.36 sec)
## `N_NATIONKEY` int not NULL ---- 如果是主键的话,必须要not null
## PRIMARY KEY (`N_NATIONKEY`) ---- 务必与ots中的主键顺序相同;名称的话也要对应
5)查看自己创建的表和相关的DDL语句:
mysql> show tables;
+------------+
| Table_Name |
+------------+
| nation |
+------------+
1 row in set (0.35 sec)
mysql> show create table nation;
+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| nation | CREATE EXTERNAL TABLE `nation` (
`n_nationkey` int not NULL COMMENT '',
`n_comment` varchar(100) NULL COMMENT '',
`n_name` varchar(100) NULL COMMENT '',
`n_regionkey` int NULL COMMENT '',
PRIMARY KEY (`n_nationkey`)
)
TBLPROPERTIES (COLUMN_MAPPING = 'n_nationkey,N_NATIONKEY; n_comment,N_COMMENT; n_name,N_NAME; n_regionkey,N_REGIONKEY; ')
COMMENT '' |
+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.30 sec)
6)开始查询和分析(没有做太复杂的query;用户可以分析自己的数据,符合mysql的语法)
mysql> select count(*) from nation;
+-------+
| count(*) |
+-------+
| 25 |
+-------+
1 row in set (1.19 sec)
mysql> select * from nation;
+-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
| n_nationkey | n_comment | n_name | n_regionkey |
+-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
| 0 | haggle. carefully final deposits detect slyly agai | ALGERIA | 0 |
| 1 | al foxes promise slyly according to the regular accounts. bold requests alon | ARGENTINA | 1 |
| 2 | y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special | BRAZIL | 1 |
| 3 | eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold | CANADA | 1 |
| 4 | y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d | EGYPT | 4 |
| 5 | ven packages wake quickly. regu | ETHIOPIA | 0 |
| 6 | refully final requests. regular, ironi | FRANCE | 3 |
| 7 | l platelets. regular accounts x-ray: unusual, regular acco | GERMANY | 3 |
| 8 | ss excuses cajole slyly across the packages. deposits print aroun | INDIA | 2 |
| 9 | slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull | INDONESIA | 2 |
| 10 | efully alongside of the slyly final dependencies. | IRAN | 4 |
| 11 | nic deposits boost atop the quickly final requests? quickly regula | IRAQ | 4 |
| 12 | ously. final, express gifts cajole a | JAPAN | 2 |
| 13 | ic deposits are blithely about the carefully regular pa | JORDAN | 4 |
| 14 | pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t | KENYA | 0 |
| 15 | rns. blithely bold courts among the closely regular packages use furiously bold platelets? | MOROCCO | 0 |
| 16 | s. ironic, unusual asymptotes wake blithely r | MOZAMBIQUE | 0 |
| 17 | platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun | PERU | 1 |
| 18 | c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos | CHINA | 2 |
| 19 | ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account | ROMANIA | 3 |
| 20 | ts. silent requests haggle. closely express packages sleep across the blithely | SAUDI ARABIA | 4 |
| 21 | hely enticingly express accounts. even, final | VIETNAM | 2 |
| 22 | requests against the platelets use never according to the quickly regular pint | RUSSIA | 3 |
| 23 | eans boost carefully special requests. accounts are. carefull | UNITED KINGDOM | 3 |
| 24 | y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be | UNITED STATES | 1 |
+-------------+--------------------------------------------------------------------------------------------------------------------+----------------+-------------+
25 rows in set (1.63 sec)
从图中的id,可以看到,与ots中的数据相同: