阿里云开源离线同步工具DataX3.0简单介绍

介绍:

DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能

  • 方法一、直接下载DataX工具包:DataX下载地址

    下载后解压至本地某个目录,进入bin目录,即可运行同步作业:

    $ cd  {YOUR_DATAX_HOME}/bin
    $ python datax.py {YOUR_JOB.json}

    自检脚本:    python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json

  • 方法二、下载DataX源码,自己编译:DataX源码

    (1)、下载DataX源码:

    $ git clone [email protected]:alibaba/DataX.git

    (2)、通过maven打包:

    $ cd  {DataX_source_code_home}
    $ mvn -U clean package assembly:assembly -Dmaven.test.skip=true

    打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/,目录结构跟方法一一样

下面有几个实例:

oracle->file:

{  
    "job": {  
        "content": [  
            {  
                 "reader": {
                    "name": "oraclereader",
                    "parameter": {
                        "username": "loge",
                        "password": "123456",
                        "where": "",
                        "connection": [
                            {
                                "querySql": [
                                    "select * from table1"
                                ],
                                "jdbcUrl": [
                                    "jdbc:oracle:thin:@192.168.1.10:1521:orcl"
                                ]
                            }
                        ]
                    }
                },  
                "writer": {  
                    "name": "txtfilewriter",
                    "parameter": {
                        "path": "/space/datax/tmp/",
                        "fileName": "oracledata",
                        "writeMode": "truncate",
                        "dateFormat": "yyyy-MM-dd HH:mm:ss"
                       }
                   }  
                } 
        ],  
        "setting": {  
            "speed": {  
                "channel": 1  
            }  
        }  
    }  
} 

oracle->mysql:

{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            }
        },
        "content": [
            {
                  "reader": {
                    "name": "oraclereader",
                    "parameter": {
                        "username": "loge",
                        "password": "123456",
                        "column": ["*"],
                        "where": "",
                        "connection": [
                            {
                                "querySql": [
                                    "select * from table1"
                                ],
                                "jdbcUrl": [
                                    "jdbc:oracle:thin:@192.168.1.10:1521:orcl"
                                ]
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "insert",
                        "username": "loge",
                        "password": "123456",
                        "column": [
                            "id",
                            "name"
                        ],
                        "session": [
                        	"set session sql_mode='ANSI'"
                        ],
                        "preSql": [
                            "delete from table1"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.1.22:3306/t?useUnicode=true&characterEncoding=utf8",
                                "table": [
                                    "table1"
                                ]
                            }
                        ]
                    }
                }
            }
        ]
    }
}

 

 

你可能感兴趣的:(DB&SQL,ETL)