DataX 3.0 在Windows下基于MySQL做数据迁移示例

在 Windows 安装 Datax:

Datax 官网:https://github.com/alibaba/DataX

环境要求:

1:JDK(1.8以上,推荐1.8,并配置好环境变量)

2:Python(网上推荐Python2.7.X,本人Python3.10亲测没问题。并配置好环境变量)

3:Maven(建议不低于maven3.5,并配置好环境变量)

Datax 安装:

1:下载地址:https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202303/datax.tar.gz

2:下载后解压至本地某个目录,进入bin目录。我本地解压的路径(D:\datax)

测试Datax安装

执行cmd命令进入Datax下的bin目录。我Datax解压到 D:\datax,所以具体进入路径是 D:\datax\bin

cd D:\datax\bin

第一步先输入CHCP 65001命令防止控制台中文乱码,第二步再进行自检脚本命令

CHCP 65001

python .\datax.py ..\job\job.json

会得到如下结果:

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2023-06-14 13:09:32.800 [main] INFO  MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2023-06-14 13:09:32.803 [main] INFO  MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2023-06-14 13:09:32.813 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-06-14 13:09:32.818 [main] INFO  Engine - the machine info  =>

        osInfo: Oracle Corporation 1.8 25.361-b09
        jvmInfo:        Windows 10 amd64 10.0
        cpu num:        8

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size
        PS Eden Space                  | 256.00MB                       | 256.00MB
        Code Cache                     | 240.00MB                       | 2.44MB
        Compressed Class Space         | 1,024.00MB                     | 0.00MB
        PS Survivor Space              | 42.50MB                        | 42.50MB
        PS Old Gen                     | 683.00MB                       | 683.00MB
        Metaspace                      | -0.00MB                        | 0.00MB


2023-06-14 13:09:32.828 [main] INFO  Engine -
{
        "setting":{
                "speed":{
                        "channel":1
                },
                "errorLimit":{
                        "record":0,
                        "percentage":0.02
                }
        },
        "content":[
                {
                        "reader":{
                                "name":"streamreader",
                                "parameter":{
                                        "column":[
                                                {
                                                        "value":"DataX",
                                                        "type":"string"
                                                },
                                                {
                                                        "value":19890604,
                                                        "type":"long"
                                                },
                                                {
                                                        "value":"1989-06-04 00:00:00",
                                                        "type":"date"
                                                },
                                                {
                                                        "value":true,
                                                        "type":"bool"
                                                },
                                                {
                                                        "value":"test",
                                                        "type":"bytes"
                                                }
                                        ],
                                        "sliceRecordCount":100000
                                }
                        },
                        "writer":{
                                "name":"streamwriter",
                                "parameter":{
                                        "print":false,
                                        "encoding":"UTF-8"
                                }
                        }
                }
        ]
}

2023-06-14 13:09:32.846 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false
2023-06-14 13:09:32.846 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-06-14 13:09:32.847 [main] INFO  JobContainer - Set jobId = 0
2023-06-14 13:09:32.863 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2023-06-14 13:09:32.889 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-06-14 13:09:32.892 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-06-14 13:09:32.893 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-06-14 13:09:32.900 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-06-14 13:09:32.904 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-06-14 13:09:32.904 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-06-14 13:09:32.913 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-06-14 13:09:33.240 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[328]ms
2023-06-14 13:09:33.241 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-06-14 13:09:42.913 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.018s |  All Task WaitReaderTime 0.026s | Percentage 100.00%
2023-06-14 13:09:42.914 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-06-14 13:09:42.914 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2023-06-14 13:09:42.914 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2023-06-14 13:09:42.915 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-06-14 13:09:42.915 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\datax\hook
2023-06-14 13:09:42.917 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s
                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s

2023-06-14 13:09:42.917 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-06-14 13:09:42.917 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.018s |  All Task WaitReaderTime 0.026s | Percentage 100.00%
2023-06-14 13:09:42.918 [job-0] INFO  JobContainer -
任务启动时刻                    : 2023-06-14 13:09:32
任务结束时刻                    : 2023-06-14 13:09:42
任务总计耗时                    :                 10s
任务平均流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0

到此,安装完毕。

数据迁移:

MySQL_A 数据库 对 MySQL_B数据库 做数据迁移示例:

MySQL_A 和 MySQL_B 需要提前有相同的数据库表、字段、和字段类型。(如果是其它数据库之间的数据对接,请大家自行百度各数据库之间数据类型映射关系。)

题外话:这里简单的列举下MySQL和达梦数据库的数据类型映射关系,大家仅供参考

MySQL 数据类型 达梦数据库数据类型
INT INT
VARCHAR VARCHAR
TEXT CLOB
DATETIME TIMESTAMP
DECIMAL DECIMAL
TINYINT SMALLINT

回归正题,MySQL_A 和 MySQL_B 都需要有如下 sys_role 表:

CREATE TABLE `sys_role` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id-主键 自增长',
  `name` varchar(255) NOT NULL COMMENT '角色名称',
  `create_user_id` int(11) NOT NULL DEFAULT '0' COMMENT '创建人id',
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `update_user_id` int(11) DEFAULT NULL COMMENT '修改人id',
  `update_time` datetime DEFAULT NULL COMMENT '修改时间',
  `remark` varchar(255) DEFAULT NULL COMMENT '备注',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4;

假设 MySQL_A 有如下数据,需要向B做数据迁移

 开始用Datax做数据迁移:

1:进入到datax安装目录的job文件夹下,我的是在D:\datax\job

2:文件夹内有一个job示例文件,我们参考示例文件新建一个job,此处示例为新建 sys_role.json,文件内容如下:

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "mysqlreader",                     // 数据源读取插件名,使用MySQL作为数据源
          "parameter": {
            "username": "root",                       // MySQL用户名
            "password": "root",                       // MySQL密码
            "column": ["*"],                           // 需要读取的字段列表,这里表示读取所有字段
            "connection": [
              {
                "jdbcUrl": ["jdbc:mysql://localhost:3306/Mysql_A"],   // MySQL连接URL,指定要读取的数据库为Mysql_A
                "table": ["sys_role"]                   // 要读取的表名为sys_role
              }
            ]
          }
        },
        "writer": {
          "name": "mysqlwriter",                      // 数据目标写入插件名,使用MySQL作为数据目标
          "parameter": {
            "username": "root",                        // MySQL用户名
            "password": "root",                        // MySQL密码
            "column": ["*"],                            // 需要写入的字段列表,这里表示写入所有字段
            "preSql": ["DELETE FROM sys_role"],         // 在写入数据前执行的SQL语句,此处表示先删除sys_role表中的数据
            "connection": [
              {
                "jdbcUrl": "jdbc:mysql://localhost:3306/Mysql_B",  // MySQL连接URL,指定要写入的数据库为Mysql_B
                "table": ["sys_role"],                   // 要写入的表名为sys_role
                "postSql": [],                            // 在写入数据后执行的SQL语句,此处为空
                "column": ["id", "name", "create_user_id", "create_time","update_user_id", "update_time", "remark"]  // 指定要写入的字段列表
              }
            ]
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 3                                  // 并发通道数,表示同时读取和写入的任务数
      },
      "errorLimit": {
        "record": 0,                                  // 允许的最大错误记录数,0表示不允许出现错误记录
        "percentage": 0.02                            // 允许的最大错误记录百分比,0.02表示最大允许2%的错误记录
      }
    }
  }
}

3:执行如下命令,进行数据迁移

python ..\bin\datax.py .\sys_role.json

4:成功执行后的结果示例:

PS D:\datax\bin> python .\datax.py ..\job\job.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2023-06-14 13:09:32.800 [main] INFO  MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2023-06-14 13:09:32.803 [main] INFO  MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2023-06-14 13:09:32.813 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-06-14 13:09:32.818 [main] INFO  Engine - the machine info  =>

        osInfo: Oracle Corporation 1.8 25.361-b09
        jvmInfo:        Windows 10 amd64 10.0
        cpu num:        8

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size
        PS Eden Space                  | 256.00MB                       | 256.00MB
        Code Cache                     | 240.00MB                       | 2.44MB
        Compressed Class Space         | 1,024.00MB                     | 0.00MB
        PS Survivor Space              | 42.50MB                        | 42.50MB
        PS Old Gen                     | 683.00MB                       | 683.00MB
        Metaspace                      | -0.00MB                        | 0.00MB


2023-06-14 13:09:32.828 [main] INFO  Engine -
{
        "setting":{
                "speed":{
                        "channel":1
                },
                "errorLimit":{
                        "record":0,
                        "percentage":0.02
                }
        },
        "content":[
                {
                        "reader":{
                                "name":"streamreader",
                                "parameter":{
                                        "column":[
                                                {
                                                        "value":"DataX",
                                                        "type":"string"
                                                },
                                                {
                                                        "value":19890604,
                                                        "type":"long"
                                                },
                                                {
                                                        "value":"1989-06-04 00:00:00",
                                                        "type":"date"
                                                },
                                                {
                                                        "value":true,
                                                        "type":"bool"
                                                },
                                                {
                                                        "value":"test",
                                                        "type":"bytes"
                                                }
                                        ],
                                        "sliceRecordCount":100000
                                }
                        },
                        "writer":{
                                "name":"streamwriter",
                                "parameter":{
                                        "print":false,
                                        "encoding":"UTF-8"
                                }
                        }
                }
        ]
}

2023-06-14 13:09:32.846 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false
2023-06-14 13:09:32.846 [main] INFO  JobContainer - DataX jobContainer starts job.
2023-06-14 13:09:32.847 [main] INFO  JobContainer - Set jobId = 0
2023-06-14 13:09:32.863 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2023-06-14 13:09:32.864 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2023-06-14 13:09:32.865 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2023-06-14 13:09:32.889 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2023-06-14 13:09:32.892 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2023-06-14 13:09:32.893 [job-0] INFO  JobContainer - Running by standalone Mode.
2023-06-14 13:09:32.900 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2023-06-14 13:09:32.904 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2023-06-14 13:09:32.904 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2023-06-14 13:09:32.913 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2023-06-14 13:09:33.240 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[328]ms
2023-06-14 13:09:33.241 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2023-06-14 13:09:42.913 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.018s |  All Task WaitReaderTime 0.026s | Percentage 100.00%
2023-06-14 13:09:42.914 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2023-06-14 13:09:42.914 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2023-06-14 13:09:42.914 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2023-06-14 13:09:42.915 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2023-06-14 13:09:42.915 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\datax\hook
2023-06-14 13:09:42.917 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s
                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s

2023-06-14 13:09:42.917 [job-0] INFO  JobContainer - PerfTrace not enable!
2023-06-14 13:09:42.917 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.018s |  All Task WaitReaderTime 0.026s | Percentage 100.00%
2023-06-14 13:09:42.918 [job-0] INFO  JobContainer -
任务启动时刻                    : 2023-06-14 13:09:32
任务结束时刻                    : 2023-06-14 13:09:42
任务总计耗时                    :                 10s
任务平均流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0

此时,MySQL_B服务,也有了相同的数据,迁移成功。

如果中间发生错误,会对整个事务进行回滚。所以,如果有多个表需要同步,建议拆成多个xxx.json 文件去配置同步。

你可能感兴趣的:(Java,Linux,MySQL,linux,服务器,java,etl)