DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能。
DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。
DataX本身作为离线数据同步框架,采用Framework + plugin架构构建。将数据源读取和写入抽象成为Reader/Writer插件,纳入到整个同步框架中。
经过几年积累,DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入。DataX目前支持数据如下:
类型 | 数据源 | Reader(读) | Writer(写) | 文档 |
---|---|---|---|---|
RDBMS 关系型数据库 | MySQL | √ | √ | 读 、写 |
Oracle | √ | √ | 读 、写 | |
SQLServer | √ | √ | 读 、写 | |
PostgreSQL | √ | √ | 读 、写 | |
DRDS | √ | √ | 读 、写 | |
达梦 | √ | √ | 读 、写 | |
通用RDBMS(支持所有关系型数据库) | √ | √ | 读 、写 | |
阿里云数仓数据存储 | ODPS | √ | √ | 读 、写 |
ADS | √ | 写 | ||
OSS | √ | √ | 读 、写 | |
OCS | √ | √ | 读 、写 | |
NoSQL数据存储 | OTS | √ | √ | 读 、写 |
Hbase0.94 | √ | √ | 读 、写 | |
Hbase1.1 | √ | √ | 读 、写 | |
MongoDB | √ | √ | 读 、写 | |
Hive | √ | √ | 读 、写 | |
无结构化数据存储 | TxtFile | √ | √ | 读 、写 |
FTP | √ | √ | 读 、写 | |
HDFS | √ | √ | 读 、写 | |
Elasticsearch | √ | 写 |
DataX Framework提供了简单的接口与插件交互,提供简单的插件接入机制,只需要任意加上一种插件,就能无缝对接其他数据源。详情请看:DataX数据源指南
概述:Metabase可以帮助你把数据库中的数据更好的呈现给更多人,数据分析人员通过建立一个”查询“(Metabase中定义为Question)来提炼数据,再通过仪表盘(Dashboards)来组合展示给公司成员
功能:
设置仅需5分钟
假设微服务体系下存在以下场景:
有如下报表统计需求:
此处表结构请根据设计图自己创建原始表(用户表+商户表+国家表+商品表+商品分类表)
下载datax: http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
解压到指定目录:
$ cd {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}
该任务为表关联合库,因为datax支持reader使用sql作为数据源,可以使用表关联查询数据合并为新表,但是普遍在互联网应用中,微服务库都独立存在,需要先将关联好的数据同步到同一个数据库,然后在进行关联后生成宽表。
合并用户商户表:
SELECT t.id AS tenant_id,t.uid,t.name AS tenant_name,u.name AS userName,g.code,g.name AS country_name FROM tenant t INNER JOIN USER u ON t.uid=u.id LEFT JOIN region g ON g.code=u.country
合并商品聚合表:
SELECT p.tenant_id,p.id AS product_id,p.name AS product_name,p.create_time,c.id AS class_id,c.name AS class_name FROM tenant_product p INNER JOIN tenant_class c ON p.tenant_class_id=c.id
创建落地数据库 :myreport,创建用户商户表和商品聚合表:
DROP TEMPORARY TABLE IF EXISTS myreport.usertenanttmp;
CREATE TEMPORARY TABLE myreport.usertenanttmp AS SELECT t.id AS tenant_id,t.uid,t.name AS tenant_name,u.name AS userName,g.code,g.name AS country_name FROM tenant t INNER JOIN USER u ON t.uid=u.id LEFT JOIN region g ON g.code=u.country LIMIT 0;
DROP TABLE IF EXISTS myreport.usertenant;
CREATE TABLE myreport.usertenant LIKE myreport.usertenanttmp;
DROP TEMPORARY TABLE IF EXISTS myreport.tenantproductdetailtmp;
CREATE TEMPORARY TABLE myreport.tenantproductdetailtmp AS SELECT p.tenant_id,p.id AS product_id,p.name AS product_name,p.create_time,c.id AS class_id,c.name AS class_name FROM tenant_product p INNER JOIN tenant_class c ON p.tenant_class_id=c.id LIMIT 0;
DROP TABLE IF EXISTS myreport.tenantproductdetail;
CREATE TABLE myreport.tenantproductdetail LIKE myreport.tenantproductdetailtmp;
创建合并用户商户任务json (usertenant.json)
{
"job": {
"setting": {
"speed": {
"channel":1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "Qjkj2018",
"connection": [
{
"querySql": [
"SELECT t.id AS tenant_id,t.uid,t.name AS tenant_name,u.name AS userName,g.code,g.name AS country_name FROM tenant t INNER JOIN USER u ON t.uid=u.id LEFT JOIN region g ON g.code=u.country;"
],
"jdbcUrl": [
"jdbc:mysql://192.168.1.230:3306/ums_docker_fat"
]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "Qjkj2018",
"column": [
"tenant_id", "uid", "tenant_name", "userName","code", "country_name"
],
"session": [
"set session sql_mode='ANSI'"
],
"preSql": [
"delete from usertenant"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.230:3306/myreport",
"table": [
"usertenant"
]
}
]
}
}
}
]
}
}
注意必须是python2.7版本,python3不支持,可以使用conda创建2.7后activate。
执行命令
python datax.py usertenant.json
创建合并商品聚合任务json (tenantproductdetail .json)
{
"job": {
"setting": {
"speed": {
"channel":1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "Qjkj2018",
"connection": [
{
"querySql": [
"SELECT p.tenant_id,p.id AS product_id,p.name AS product_name,p.create_time,c.id AS class_id,c.name AS class_name FROM tenant_product p INNER JOIN tenant_class c ON p.tenant_class_id=c.id;"
],
"jdbcUrl": [
"jdbc:mysql://192.168.1.230:3306/gvtgms_test"
]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "Qjkj2018",
"column": [
"tenant_id","product_id", "product_name", "create_time", "class_id", "class_name"
],
"session": [
"set session sql_mode='ANSI'"
],
"preSql": [
"delete from tenantproductdetail"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.230:3306/myreport",
"table": [
"tenantproductdetail"
]
}
]
}
}
}
]
}
}
执行命令
python datax.py tenantproductdetail.json
将生成两张表合并为一个大的宽表,统计分析时单表查询将更快.
合并宽表:
SELECT u.tenant_id,u.`tenant_name`,u.`uid`,u.`userName`,u.`code`,u.`country_name`,t.`class_id`,t.`class_name`,t.product_id,t.`product_name`
FROM usertenant u INNER JOIN tenantproductdetail t ON u.tenant_id=t.tenant_id
创建落地数据库 :myreport,创建用户商户表和商品聚合表:
DROP TEMPORARY TABLE IF EXISTS myreport.widetabletmp;
CREATE TEMPORARY TABLE myreport.widetabletmp AS SELECT u.tenant_id,u.tenant_name,u.uid,u.userName,u.code,u.country_name,t.class_id,t.class_name,t.product_id,t.product_name
FROM usertenant u INNER JOIN tenantproductdetail t ON u.tenant_id=t.tenant_id LIMIT 0;
DROP TABLE IF EXISTS myreport.widetable;
CREATE TABLE myreport.widetable LIKE myreport.widetabletmp;
创建合并商品聚合任务json (wide.json)
{
"job": {
"setting": {
"speed": {
"channel":1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "Qjkj2018",
"connection": [
{
"querySql": [
"SELECT u.tenant_id,u.tenant_name,u.uid,u.userName,u.code,u.country_name,t.class_id,t.class_name,t.product_id,t.product_name,t.create_time FROM usertenant u INNER JOIN tenantproductdetail t ON u.tenant_id=t.tenant_id;"
],
"jdbcUrl": [
"jdbc:mysql://192.168.1.230:3306/myreport"
]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "Qjkj2018",
"column": [
"tenant_id", "tenant_name", "uid", "userName","code", "country_name", "class_id", "class_name", "product_id", "product_name","create_time"
],
"session": [
"set session sql_mode='ANSI'"
],
"preSql": [
"delete from widetable"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.230:3306/myreport",
"table": [
"widetable"
]
}
]
}
}
}
]
}
}
执行命令
python datax.py wide.json
这里只演示按照年份生成商品数量报表,其他报表都是从宽表中通过sql导出,就不一一演示。
编写统计sql语句
创建表
CREATE TABLE year_report (report_year VARCHAR(4),report_count INT)
统计
SELECT YEAR(DATE_FORMAT(create_time, '%Y-%m-%d %H:%i:%s')) AS report_year,COUNT(*),report_count FROM widetable GROUP BY YEAR(DATE_FORMAT(create_time, '%Y-%m-%d %H:%i:%s'))
创建根据年份统计报表任务json (year.json)
{
"job": {
"setting": {
"speed": {
"channel":1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "Qjkj2018",
"connection": [
{
"querySql": [
"SELECT YEAR(DATE_FORMAT(create_time, '%Y-%m-%d %H:%i:%s')) AS report_year,COUNT(*) as report_count FROM widetable GROUP BY YEAR(DATE_FORMAT(create_time, '%Y-%m-%d %H:%i:%s'));"
],
"jdbcUrl": [
"jdbc:mysql://192.168.1.230:3306/myreport"
]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "Qjkj2018",
"column": [
"report_year", "report_count"
],
"session": [
"set session sql_mode='ANSI'"
],
"preSql": [
"delete from year_report"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://192.168.1.230:3306/myreport",
"table": [
"year_report"
]
}
]
}
}
}
]
}
}
编写shell脚本将任务串行,开始定时任务自动运行(crontab等)
python --version
@pushd %~dp0
@echo """"""""""""""""""""""""""""""""""""
@echo "------开始合并用户商户表--------"
@echo """"""""""""""""""""""""""""""""""""
@C:\Users\liaomin\anaconda3\envs\py27\python datax.py usertenant.json
@echo """"""""""""""""""""""""""""""""""""
@echo "------开始合并商品分类表--------"
@echo """"""""""""""""""""""""""""""""""""
@C:\Users\liaomin\anaconda3\envs\py27\python datax.py tenantproductdetail.json
@echo """"""""""""""""""""""""""""""""""""
@echo "------开始合并宽表--------"
@echo """"""""""""""""""""""""""""""""""""
@C:\Users\liaomin\anaconda3\envs\py27\python datax.py wide.json
@echo """"""""""""""""""""""""""""""""""""
@echo "------从宽表统计商品年份报表-------"
@echo """"""""""""""""""""""""""""""""""""
@C:\Users\liaomin\anaconda3\envs\py27\python datax.py year.json
@pause
docker安装metabase
docker run -d -p 3000:3000 --name metabase metabase/metabase
安装配置需要生成报表的数据源和对应的登录管理员账号密码:
http://192.168.0.49:3000/
进入系统后可新增数据库(点击管理员)
导航栏-数据库-新增数据库
退出管理员后,选择创建问题(翻译有点问题),选择原生查询,选择你的数据库,编写sql,点击右下角查询按钮
点击左下侧的可视化,饼图,就可以根据数据生成了
选择完成,选择保存即可,也可以创建一个仪表盘将多个报表聚合在一起。
点击分析可以查看自己的报表和仪表盘。
同时可以将自己的报表和仪表盘集成到你的项目中。
点击分享后选择在应用中嵌入这个question
其中就有相关的前端嵌入代码
嵌入的报表左下角:由Metabase提供支持页脚,这个是商业版才能去除的。