docker运行datax实现数据同步方案 -- datax/Oracle-Oracle/MySQL篇

之前已经构建好了datax镜像,只差最后一步,datax运行

 

datax的使用很简单:

python datax.py 配置文件.json

以上命令就可以直接使用datax了,本次就来分解以上命令

 

首先python打头执行命令,datax.py是datax自带的,该可执行文件再datax的bin目录下,最后加上执行的配置文件

配置文件大体结构是:

读数据库+各种读取信息和条件
|
|
|
写数据库+各种写的信息和条件
|
|
|
加上通用设定,如一次的数据量大小等等

配置文件是要用json来写的,官方有自带模板,进入到bin目录后可以通过命令查看:

 python datax.py -r oraclereader -w oraclewriter

 


2019.1.9更新

今早开始测试datax同步,经过几天的学习,对datax的使用又加深了不少理解。

datax官方自带模板,可以通过命令获得所支持的json模板

 python datax.py -r mysqlreader -w mysqlwriter > mysql2mysql1.json

-r :读的数据库模板,可以是oracle、mysql等等,但是要加上reader
-w :写的数据库模板,同上

> 把模板格式写到一个文件上,后面是自己取的文件名

本次我测试MySQL-MySQL

#在datax脚本路径查看有哪些json
[root@8ec3fabb3594 bin]# ls
datax.py  dxprof.py  perftrace.py

#获取模板,写入json文件
[root@8ec3fabb3594 bin]# python datax.py -r mysqlreader -w mysqlwriter > mysql2mysql1.json

#查看文件是否出现和文件模板内容
[root@8ec3fabb3594 bin]# ls
datax.py  dxprof.py  mysql2mysql1.json  perftrace.py
[root@8ec3fabb3594 bin]# cat mysql2mysql1.json 

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the mysqlreader document:
     https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md 

Please refer to the mysqlwriter document:
     https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md 
 
Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": [], 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "username": "", 
                        "where": ""
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": "", 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "preSql": [], 
                        "session": [], 
                        "username": "", 
                        "writeMode": ""
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}
[root@8ec3fabb3594 bin]# ls
datax.py  dxprof.py  mysql2mysql1.json  perftrace.py
[root@8ec3fabb3594 bin]# vi mysql2mysql1.json 

之后按照模板填信息,列:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        #最好指定同步的字段,否则会出警告,且耦合度较高,我这里测试就偷个懒
                        "column": ["*"], 
                        "connection": [
                            {
                                   #jdb格式就是jdbc标准格式+ip+要同步的数据库
                                "jdbcUrl": ["jdbc:mysql://45.59.13.2:3306/datax"], 
                                "table": ["test"]
                            }
                        ], 
                        "password": "My@test", 
                        "username": "root", 
                        "where": ""
                        #不是所有参数必须给值,必填参数可在git上参考官方文档
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": ["*"], 
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://45.59.13.2:3307/datax", 
                                "table": ["test"]
                            }
                        ], 
                        "password": "My@test", 
                        "preSql": [], 
                        "session": [], 
                        "username": "root", 
                        #模式最好指定
                        "writeMode": "insert"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "1"
            }
        }
    }
}

之后直解运行datax即可:

#同步命令
[root@8ec3fabb3594 bin]# python datax.py mysql2mysql1.json

#运行日志
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2019-01-09 02:06:26.025 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2019-01-09 02:06:26.035 [main] INFO  Engine - the machine info  => 

        osInfo: Oracle Corporation 1.8 25.191-b12
        jvmInfo:        Linux amd64 3.10.0-862.2.3.el7.x86_64
        cpu num:        4

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size                      
        PS Eden Space                  | 256.00MB                       | 256.00MB                       
        Code Cache                     | 240.00MB                       | 2.44MB                         
        Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
        PS Survivor Space              | 42.50MB                        | 42.50MB                        
        PS Old Gen                     | 683.00MB                       | 683.00MB                       
        Metaspace                      | -0.00MB                        | 0.00MB                         


2019-01-09 02:06:26.061 [main] INFO  Engine - 
{
        "content":[
                {
                        "reader":{
                                "name":"mysqlreader",
                                "parameter":{
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:mysql://45.59.13.1:3306/datax"
                                                        ],
                                                        "table":[
                                                                "test"
                                                        ]
                                                }
                                        ],
                                        "password":"*******",
                                        "username":"root",
                                        "where":""
                                }
                        },
                        "writer":{
                                "name":"mysqlwriter",
                                "parameter":{
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":"jdbc:mysql://45.59.13.1:3307/datax",
                                                        "table":[
                                                                "test"
                                                        ]
                                                }
                                        ],
                                        "password":"*******",
                                        "preSql":[],
                                        "session":[],
                                        "username":"root",
                                        "writeMode":"insert"
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":"1"
                }
        }
}

2019-01-09 02:06:26.085 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2019-01-09 02:06:26.087 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2019-01-09 02:06:26.088 [main] INFO  JobContainer - DataX jobContainer starts job.
2019-01-09 02:06:26.090 [main] INFO  JobContainer - Set jobId = 0
2019-01-09 02:06:26.595 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://45.59.113.211:3306/datax?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2019-01-09 02:06:26.597 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2019-01-09 02:06:26.911 [job-0] INFO  OriginalConfPretreatmentUtil - table:[test] all columns:[
id,test
].
2019-01-09 02:06:26.911 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2019-01-09 02:06:26.913 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,test) VALUES(?,?)
], which jdbcUrl like:[jdbc:mysql://45.59.113.211:3307/datax?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2019-01-09 02:06:26.914 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2019-01-09 02:06:26.914 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2019-01-09 02:06:26.915 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2019-01-09 02:06:26.915 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2019-01-09 02:06:26.916 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2019-01-09 02:06:26.922 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2019-01-09 02:06:26.923 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2019-01-09 02:06:26.945 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2019-01-09 02:06:26.950 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2019-01-09 02:06:26.955 [job-0] INFO  JobContainer - Running by standalone Mode.
2019-01-09 02:06:26.965 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2019-01-09 02:06:26.973 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2019-01-09 02:06:26.974 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2019-01-09 02:06:26.992 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2019-01-09 02:06:26.997 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from test 
] jdbcUrl:[jdbc:mysql://45.59.113.211:3306/datax?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-01-09 02:06:27.034 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from test 
] jdbcUrl:[jdbc:mysql://45.59.113.211:3306/datax?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-01-09 02:06:27.294 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[308]ms
2019-01-09 02:06:27.296 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2019-01-09 02:06:36.983 [job-0] INFO  StandAloneJobContainerCommunicator - Total 2 records, 6 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2019-01-09 02:06:36.983 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2019-01-09 02:06:36.985 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2019-01-09 02:06:36.985 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2019-01-09 02:06:36.986 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2019-01-09 02:06:36.987 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /data/datax/hook
2019-01-09 02:06:36.989 [job-0] INFO  JobContainer - 
         [total cpu info] => 
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
                -1.00%                         | -1.00%                         | -1.00%
                        

         [total gc info] => 
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2019-01-09 02:06:36.990 [job-0] INFO  JobContainer - PerfTrace not enable!
2019-01-09 02:06:36.991 [job-0] INFO  StandAloneJobContainerCommunicator - Total 2 records, 6 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2019-01-09 02:06:36.992 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2019-01-09 02:06:26
任务结束时刻                    : 2019-01-09 02:06:36
任务总计耗时                    :                 10s
任务平均流量                    :                0B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   2
读写失败总数                    :                   0



#由上可以看出,有些不严谨的会有警告弹出

同理,我们可以获取Oracle-Oracle模板或者Oracle-MySQL模板等等,配置json后运行即可。

你可能感兴趣的:(python,datax,docker)