DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。
相关概念和设计理念不再赘述,这里记录个人初次使用时的过程。
[root@my571 bin]# wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz[root@my571 bin]# unzip datax.tar.gz
[root@my571 bin]# cd ./datax/bin/[root@my571 bin]# python datax.py -r mysqlreader -w mysqlwriter > mysql2mysql.json
[root@my571 bin]# cat mysql2mysql.json[root@my571 bin]# cat mysql2mysql.json{"job": {"content": [{"reader": {"name": " mysqlreader ","parameter": {"column": [" id "," user_id "],"connection": [{"jdbcUrl": [" jdbc:mysql://192.168.225.131:3306/iris "],"table": [" seiki1 "]}],"password": "root","username": "root"}},"writer": {"name": " mysqlwriter ","parameter": {"column": [" id "," user_id "],"connection": [{"jdbcUrl": " jdbc:mysql://192.168.225.130:3306/iris ","table": [" seiki1 "]}],"password": "root","username": "root","writeMode": " insert "}}}],"setting": {"speed": {"channel": "1"}}}}
数据源:IP:131;目标库:IP:130;库名:iris;表名:seiki1;字段:id,user_id( writer和reader列名一一对应)。
[root@my571 bin]# python ./datax.py ./mysql2mysql.json
root@my571 bin]# python ./datax.py mysql2mysql.jsonDataX (DATAX-OPENSOURCE-3.0), From Alibaba !Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2019-12-27 15:55:08.264 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl2019-12-27 15:55:08.275 [main] INFO Engine - the machine info =>osInfo: Oracle Corporation 1.8 25.221-b11jvmInfo: Linux amd64 3.10.0-957.el7.x86_64cpu num: 1totalPhysicalMemory: -0.00GfreePhysicalMemory: -0.00GmaxFileDescriptorCount: -1currentOpenFileDescriptorCount: -1GC Names [Copy, MarkSweepCompact]MEMORY_NAME | allocation_size | init_sizeEden Space | 273.06MB | 273.06MBCode Cache | 240.00MB | 2.44MBSurvivor Space | 34.13MB | 34.13MBCompressed Class Space | 1,024.00MB | 0.00MBMetaspace | -0.00MB | 0.00MBTenured Gen | 682.69MB | 682.69MB……
2019-12-27 15:55:09.312 [job-0] ERROR RetryUtil - Exception when calling callable, 异常Msg:Code:[MYSQLErrCode-02], Description:[数据库服务的IP地址或者Port错误,请检查填写的IP地址和Port或者联系DBA确认IP地址和Port是否正确。如果是同步中心用户请联系DBA确认idb上录入的IP和PORT信息和数据库的当前实际信息是一致的]. - 具体错误信息为:com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failureThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
2019-12-27 16:14:29.083 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.2019-12-27 16:14:29.083 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.2019-12-27 16:14:29.084 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.2019-12-27 16:14:29.084 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.2019-12-27 16:14:29.084 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /data/datax/hook2019-12-27 16:14:29.086 [job-0] INFO JobContainer -[total cpu info] =>averageCpu | maxDeltaCpu | minDeltaCpu-1.00% | -1.00% | -1.00%[total gc info] =>NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTimeCopy | 1 | 1 | 1 | 0.098s | 0.098s | 0.098sMarkSweepCompact | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s2019-12-27 16:14:29.086 [job-0] INFO JobContainer - PerfTrace not enable!2019-12-27 16:14:29.086 [job-0] INFO StandAloneJobContainerCommunicator - Total 100858 records, 1295828 bytes | Speed 126.54KB/s, 10085 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 2.048s | All Task WaitReaderTime 0.437s | Percentage 100.00%2019-12-27 16:14:29.087 [job-0] INFO JobContainer -任务启动时刻 : 2019-12-27 16:14:17任务结束时刻 : 2019-12-27 16:14:29任务总计耗时 : 11s任务平均流量 : 126.54KB/s记录写入速度 : 10085rec/s读出记录总数 : 100858读写失败总数 : 0成功同步10085条记录,失败0条。
All Task WaitWriterTime 18.691s | All Task WaitReaderTime 0.221s | Percentage 0.00%2019-12-27 15:00:12.241 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Could not retrieve transation read-only status server2019-12-27 15:00:13.455 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Could not retrieve transation read-only status server2019-12-27 15:00:14.670 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Could not retrieve transation read-only status server2019-12-27 15:00:15.896 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Could not retrieve transation read-only status server2019-12-27 15:00:17.072 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Could not retrieve transation read-only status server2019-12-27 15:00:18.276 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Could not retrieve transation read-only status server2019-12-27 15:00:19.499 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Could not retrieve transation read-only status server2019-12-27 15:00:20.775 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Could not retrieve transation read-only status server2019-12-27 15:00:21.949 [0-0-0-writer] WARN CommonRdbmsWriter$Task - 回滚此次写入, 采用每次写入一行方式提交. 因为:Could not retrieve transation read-only status server
[root@my571 bin]# python -m pip install mysql-connector
[root@my571 bin]# yum -y install *pip*
root@localhost|iris>SET GLOBAL transaction_isolation='REPEATABLE-READ';
root@localhost|(none)>SET GLOBAL transaction_isolation='READ-COMMITTED';
[root@my571 bin]# cat mysql2mysql22.json{"job": {"content": [{"reader": {"name": "mysqlreader","parameter": {"column": ["id","user_id"] ,"connection": [{"jdbcUrl": [" jdbc:mysql://192.168.225.131:3306/iris "],"querySql":["select a.id,a.user_id,concat(a.user_id,b.name) from seiki1 a inner join aaa b on a.id=b.id"]}],"password": "root","username": "root"}},"writer": {"name": "mysqlwriter","parameter": {"column": ["id","user_id","name"],"connection": [{"jdbcUrl": " jdbc:mysql://192.168.225.130:3306/iris ","table": ["seiki1"]}],"password": "root","username": "root","writeMode": "insert"}}}],"setting": {"speed": {"channel": "1"}}}}
①select中字段个数和writer节点上“column”的个数需保持一致,②writer节点的jdbcUrl,不可以同reader节点那样用“[]”括起来。
2019-12-27 17:44:33.306 [job-0] ERROR JobContainer - 运行scheduler 模式[standalone]出错.2019-12-27 17:44:33.307 [job-0] ERROR JobContainer - Exception when job runcom.alibaba.datax.common.exception.DataXException: Code:[DBUtilErrorCode-00], Description:[您的配置错误.]. - 列配置信息有错误. 因为您配置的任务中,源头读取字段数:3 与 目的表要写入的字段数:2 不相等. 请检查您的配置并作出修改.如果能确保 select中的字段与目标表保持一致,可在writer节点的配置上去掉“column”相关配置。
2019-12-30 14:10:45.251 [job-0] ERROR RetryUtil - Exception when calling callable, 即将尝试执行第1次重试.本次重试计划等待[1000]ms,实际等待[1001]ms, 异常Msg:[Code:[DBUtilErrorCode-10], Description:[连接数据库失败. 请检查您的 账号、密码、数据库名称、IP、Port或者向 DBA 寻求帮助(注意网络环境).]. - 具体错误信息为:java.sql.SQLException: No suitable driver found for ["jdbc:mysql://192.168.225.130:3306/iris"]?
2019-12-27 17:46:18.457 [job-0] INFO JobContainer -[total cpu info] =>averageCpu | maxDeltaCpu | minDeltaCpu-1.00% | -1.00% | -1.00%[total gc info] =>NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTimeCopy | 0 | 0 | 0 | 0.000s | 0.000s | 0.000sMarkSweepCompact | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s2019-12-27 17:46:18.457 [job-0] INFO JobContainer - PerfTrace not enable!2019-12-27 17:46:18.458 [job-0] INFO StandAloneJobContainerCommunicator - Total 22 records, 260 bytes | Speed 26B/s, 2 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%2019-12-27 17:46:18.459 [job-0] INFO JobContainer -任务启动时刻 : 2019-12-27 17:46:07任务结束时刻 : 2019-12-27 17:46:18任务总计耗时 : 10s任务平均流量 : 26B/s记录写入速度 : 2rec/s读出记录总数 : 22读写失败总数 : 0