1.首先,先来了解下Datax是做什么的吧:https://github.com/alibaba/DataX/blob/master/introduction.md
2.Datax下载链接(Windows也可以使用):http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
3.正如教程讲的,使用Datax之前要先安装JDK和PYTHON2.7.16环境。
安装PY2.7教程:https://www.jb51.net/article/136324.htm
4.假设我们现在要同步Mysql-Mysql的数据,先建好两个库(test1和test2)和连个表(channel_logs和channel_log_test)
/* 建表语句
Navicat MySQL Data Transfer
Source Server : 本地
Source Server Version : 50711
Source Host : localhost:3306
Source Database : test1
Target Server Type : MYSQL
Target Server Version : 50711
File Encoding : 65001
Date: 2019-03-29 12:01:47
*/
SET FOREIGN_KEY_CHECKS=0;
-- ----------------------------
-- Table structure for `channel_logs`
-- ----------------------------
DROP TABLE IF EXISTS `channel_logs`;
CREATE TABLE `channel_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
`name` varchar(255) DEFAULT NULL COMMENT '当前名称',
`channel_id` int(11) DEFAULT NULL COMMENT 'id'
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8;
-- ----------------------------
-- Records of channel_logs
-- ----------------------------
INSERT INTO `channel_logs` VALUES ('6', '记录1', '8');
INSERT INTO `channel_logs` VALUES ('7', '记录32', '8');
INSERT INTO `channel_logs` VALUES ('8', '记录3', '8');
INSERT INTO `channel_logs` VALUES ('9', '记录4', '8');
INSERT INTO `channel_logs` VALUES ('10', '记录5', '8');
INSERT INTO `channel_logs` VALUES ('11', '记录6', '8');
INSERT INTO `channel_logs` VALUES ('12', '记录6', '9');
INSERT INTO `channel_logs` VALUES ('13', '记录7', '10');
INSERT INTO `channel_logs` VALUES ('14', '记录8', '9');
INSERT INTO `channel_logs` VALUES ('15', '记录9', '10');
INSERT INTO `channel_logs` VALUES ('16', '记录98', '9');
INSERT INTO `channel_logs` VALUES ('17', '记录0', '10');
INSERT INTO `channel_logs` VALUES ('18', '记录67', '9');
INSERT INTO `channel_logs` VALUES ('19', '记录77', '10');
INSERT INTO `channel_logs` VALUES ('20', '记录88', '10');
INSERT INTO `channel_logs` VALUES ('21', '记录858', '10');
INSERT INTO `channel_logs` VALUES ('22', '记录96', '9');
INSERT INTO `channel_logs` VALUES ('23', '记录465', '10');
INSERT INTO `channel_logs` VALUES ('24', '记录65', '9');
两个表结构一致,第一个表插入数据,第二个表作为空表。
5.好了,表建好之后,现在准备datax脚本,如下:
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"id",
"name"
],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test1"],
"table": ["channel_logs"]
}
],
"password": "123456",
"username": "canal"
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [
"id",
"name"
],
"connection": [
{
"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/test2",
"table": ["channel_log_test"]
}
],
"password": "123456",
"username": "canal"
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
6.打开cmd.exe执行datax脚本,输入:
python E:\datax\bin\datax.py E:\datax\bin\test.json
如图:
7.如下的执行结果即成功:
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2019-03-29 10:39:50.043 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2019-03-29 10:39:50.051 [main] INFO Engine - the machine info =>
osInfo: Oracle Corporation 1.8 25.0-b70
jvmInfo: Windows 8.1 amd64 6.3
cpu num: 4
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB
2019-03-29 10:39:50.067 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"mysqlreader",
"parameter":{
"column":[
"id",
"name"
],
"connection":[
{
"jdbcUrl":[
"jdbc:mysql://127.0.0.1:3306/gzhgdb"
],
"table":[
"channel_logs"
]
}
],
"password":"******",
"username":"canal"
}
},
"writer":{
"name":"mysqlwriter",
"parameter":{
"column":[
"id",
"name"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://127.0.0.1:3306/test",
"table":[
"channel_log_test"
]
}
],
"password":"******",
"username":"canal"
}
}
}
],
"setting":{
"speed":{
"channel":1
}
}
}
2019-03-29 10:39:50.114 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2019-03-29 10:39:50.130 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2019-03-29 10:39:50.130 [main] INFO JobContainer - DataX jobContainer starts job.
2019-03-29 10:39:50.130 [main] INFO JobContainer - Set jobId = 0
2019-03-29 10:39:50.671 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://127.0.0.1:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2019-03-29 10:39:50.686 [job-0] INFO OriginalConfPretreatmentUtil - table:[channel_logs] has columns:[id,name,channel_id].
2019-03-29 10:39:51.061 [job-0] INFO OriginalConfPretreatmentUtil - table:[channel_log_test] all columns:[
id,name,channel_id].
2019-03-29 10:39:51.077 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (id,name) VALUES(?,?)
], which jdbcUrl like:[jdbc:mysql://127.0.0.1:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2019-03-29 10:39:51.077 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2019-03-29 10:39:51.077 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2019-03-29 10:39:51.077 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2019-03-29 10:39:51.077 [job-0] INFO JobContainer - jobContainer starts to do split ...
2019-03-29 10:39:51.077 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.
2019-03-29 10:39:51.077 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2019-03-29 10:39:51.092 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2019-03-29 10:39:51.148 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2019-03-29 10:39:51.164 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2019-03-29 10:39:51.164 [job-0] INFO JobContainer - Running by standalone Mode.
2019-03-29 10:39:51.179 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2019-03-29 10:39:51.179 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2019-03-29 10:39:51.179 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2019-03-29 10:39:51.211 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2019-03-29 10:39:51.211 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from channel_logs
] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/gzhgdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-03-29 10:39:51.257 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from channel_logs
] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-03-29 10:39:51.532 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[321]ms
2019-03-29 10:39:51.532 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2019-03-29 10:40:01.196 [job-0] INFO StandAloneJobContainerCommunicator - Total 19 records, 148 bytes | Speed 14B/s, 1 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2019-03-29 10:40:01.196 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2019-03-29 10:40:01.200 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2019-03-29 10:40:01.204 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
2019-03-29 10:40:01.208 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2019-03-29 10:40:01.216 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: E:\datax\hook
2019-03-29 10:40:01.220 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
2019-03-29 10:40:01.220 [job-0] INFO JobContainer - PerfTrace not enable!
2019-03-29 10:40:01.224 [job-0] INFO StandAloneJobContainerCommunicator - Total 19 records, 148 bytes | Speed 14B/s, 1 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2019-03-29 10:40:01.232 [job-0] INFO JobContainer -
任务启动时刻 : 2019-03-29 10:39:50
任务结束时刻 : 2019-03-29 10:40:01
任务总计耗时 : 11s
任务平均流量 : 14B/s
记录写入速度 : 1rec/s
读出记录总数 : 19
读写失败总数 : 0
注:
1.datax脚本中不要有:"#注释";
2.cmd.exe执行datax脚本命令时乱码解决:输入CHCP 65001;
3.数据库中的数据中文乱码解决:在datax脚本文件中jdbcUrl项加上:?characterEncoding=utf8。