DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。
也就是数据库的数据同步工具,免费版没有web页面,需在控制台进行操作
DataX 下载地址:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
DataX 快速启动地址:https://github.com/alibaba/DataX/blob/master/userGuid.md
DataX github地址:https://github.com/alibaba/DataX
下载请点击第一个地址下载,快速使用可以选择不看,下方会详细讲解
windows或者macos安装好适配自己的环境(这里就自行百度啦)
DataX下载完后,是一个datax.tat.gz的包,解压它,得到datax文件夹
进入datax
1.数据源文档(gitHub加载较慢,用流量看github稍微快一些)
类型 | 数据源 | Reader(读) | Writer(写) | 文档 |
---|---|---|---|---|
RDBMS 关系型数据库 | MySQL | √ | √ | 读 、写 |
Oracle | √ | √ | 读 、写 | |
OceanBase | √ | √ | 读 、写 | |
SQLServer | √ | √ | 读 、写 | |
PostgreSQL | √ | √ | 读 、写 | |
DRDS | √ | √ | 读 、写 | |
通用RDBMS(支持所有关系型数据库) | √ | √ | 读 、写 | |
阿里云数仓数据存储 | ODPS | √ | √ | 读 、写 |
ADS | √ | 写 | ||
OSS | √ | √ | 读 、写 | |
OCS | √ | √ | 读 、写 | |
NoSQL数据存储 | OTS | √ | √ | 读 、写 |
Hbase0.94 | √ | √ | 读 、写 | |
Hbase1.1 | √ | √ | 读 、写 | |
Phoenix4.x | √ | √ | 读 、写 | |
Phoenix5.x | √ | √ | 读 、写 | |
MongoDB | √ | √ | 读 、写 | |
Hive | √ | √ | 读 、写 | |
Cassandra | √ | √ | 读 、写 | |
无结构化数据存储 | TxtFile | √ | √ | 读 、写 |
FTP | √ | √ | 读 、写 | |
HDFS | √ | √ | 读 、写 | |
Elasticsearch | √ | 写 | ||
时间序列数据库 | OpenTSDB | √ | 读 | |
TSDB | √ | √ | 读 、写 |
2.配置json文件
使用dataX很简单,只需要配置一个json文件即可,配置内容大致如下:
而该json文件的空模板datax也提供给我们了,使用如下命令获取(首先进入datax/bin/下):
python datax.py -r {读取数据库的插件名} -w {写入数据库的插件名}
比如我想要用txt文件作为读取的数据源,mysql数据库作为写入的数据源,获取它的模板如下:
python datax.py -r txtfilereader -w mysqlwriter
得到如下json:
{
"job": {
"content": [
{
"reader": {
"name": "txtfilereader",
"parameter": {
"column": [],
"encoding": "",
"fieldDelimiter": "",
"path": []
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": "",
"table": []
}
],
"password": "",
"preSql": [],
"session": [],
"username": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}
里面具体的参数应该填写什么值,就看上方的那个表格(数据源文档),作为读的数据源就去看读的文档,作为写的数据源就看写的文档,将配置填入后,新建json文件,复制该json文本到json文件中。
3.运行该json文件(首先进入datax/bin/下)
python datax.py {你的json配置文件的路径}
以txtfilereader(txt文件读取插件)和mysqlwriter(mysql写入插件)作为数据同步示例。
1.首先获取txtfilereader与mysqlwriter的空配置模板
python datax.py -r txtfilereader -w mysqlwriter
结果如下:
{
"job": {
"content": [
{
"reader": {
"name": "txtfilereader",
"parameter": {
"column": [],
"encoding": "",
"fieldDelimiter": "",
"path": []
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": "",
"table": []
}
],
"password": "",
"preSql": [],
"session": [],
"username": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}
2.新建json文件,复制模板内容,名字可取需要同步的表的表名,我取名为comm_attachment.json
3.新建mysql表(作为写入的表)
create table comm_attachment
(
id int auto_increment comment '主键'
primary key,
project_type varchar(100) null comment '项目名-说明该附件是属于哪个项目的',
attach_type varchar(100) null comment '附件类型',
attach_name varchar(200) null comment '附件名',
attach_url varchar(500) null comment '附件下载地址',
attach_key varchar(500) null comment '附件key',
attach_hash varchar(500) null comment '附件hash',
attach_size varchar(100) null comment '附件大小',
created_by varchar(100) null comment '创建人',
created_date timestamp null comment '创建时间',
last_updated_by varchar(100) null comment '最后修改人',
last_updated_date timestamp null comment '最后修改时间',
version int null comment '乐观锁-版本号'
)
comment '附件表';
4.新建csv文件,编造一些测试数据进去
我的mysql数据源的表comm_attachment有12列,那么csv的内容也和该表的列顺序一致(如果顺序不一致,更改配置中writer.column数组的顺序)
建好后,把csv后缀改为txt即可,txt内容如下:
1,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
2,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
3,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
4,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
5,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
6,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
7,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
8,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
9,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
10,sunnyDay,image/png,振动测量钉钉图标.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1AI?attname=%E6%8C%AF%E5%8A%A8%E6%B5%8B%E9%87%8F%E9%92%89%E9%92%89%E5%9B%BE%E6%A0%87.png,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0
5.查阅文档后,编写json配置
类型 | 数据源 | Reader(读) | Writer(写) | 文档 |
---|---|---|---|---|
RDBMS 关系型数据库 | MySQL | √ | √ | 读 、写 |
无结构化数据存储 | TxtFile | √ | √ | 读 、写 |
comm_attachment.json:
{
"job": {
"content": [
{
"reader": {
"name": "txtfilereader",
"parameter": {
"column": [
{
"index": 0,
"type": "long"
},
{
"index": 1,
"type": "string"
},
{
"index": 2,
"type": "string"
},
{
"index": 3,
"type": "string"
},
{
"index": 4,
"type": "string"
},
{
"index": 5,
"type": "string"
},
{
"index": 6,
"type": "string"
},
{
"index": 7,
"type": "string"
},
{
"index": 8,
"type": "string"
},
{
"index": 9,
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
{
"index": 10,
"type": "string"
},
{
"index": 11,
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
{
"index": 12,
"type": "long"
},
],
"encoding": "UTF-8",
"fieldDelimiter": ",",
"path": [
"/Users/cheng/Desktop/sunnyday_comm_attachment.txt"
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [
"id",
"project_type",
"attach_type",
"attach_name",
"attach_url",
"attach_key",
"attach_hash",
"attach_size",
"created_by",
"created_date",
"last_updated_by",
"last_updated_date",
"version"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://localhost:3306/sunnyday?characterEncoding=utf-8&serverTimezone=Asia/Shanghai",
"table": [
"comm_attachment"
]
}
],
"password": "12345678",
"preSql": [
"delete from comm_attachment"
],
"session": [
],
"username": "root",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "5"
}
}
}
}
6.执行json配置
执行(在datax/bin下):
python datax.py /Users/cheng/Desktop/comm_attachment.json
部分结果:
2021-11-08 12:04:03.357 [job-0] INFO JobContainer - PerfTrace not enable!
2021-11-08 12:04:03.358 [job-0] INFO StandAloneJobContainerCommunicator - Total 10 records, 2741 bytes | Speed 274B/s, 1 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-11-08 12:04:03.359 [job-0] INFO JobContainer -
任务启动时刻 : 2021-11-08 12:03:52
任务结束时刻 : 2021-11-08 12:04:03
任务总计耗时 : 10s
任务平均流量 : 274B/s
记录写入速度 : 1rec/s
读出记录总数 : 10
读写失败总数 : 0
7.查看写入源的数据
写入成功