DataX: Ⅱ

序言

这里使用的是master分支,因为官网上并没有release分支,所以先用master分支吧,可能会有问题[email protected]

参考资料:

  1. https://github.com/alibaba/DataX
  2. https://github.com/alibaba/DataX/blob/master/introduction.md    --插件说明文档

源码打包

  1. 首先下载 GitHub - alibaba/DataX: DataX是阿里云DataWorks数据集成的开源版本。代码
  2. 首先如果是JDK17则会报错,后来选择JDK1.8
  3. Datax的运行依赖于python所以需要安装python2或者python3,centos7自带的有python2.7.5
  4. 然后打包生成可执行的文件 mvn -U clean package assembly:assembly -Dmaven.test.skip=true
  5. 成功后在根目录下的target中有相关的打包结果,如果包含所有Reader和Writer则打包会慢一点,但是还是有必要的

DataX: Ⅱ_第1张图片

执行命令

在datax的bin目录下 

  1. python datax.py -r {YOUR_READER} -w {YOUR_WRITER}   该命令是显示对应的json模板,也可以直接从source或者reader的文档中查看
  2. python datax.py json文件   该命令就是执行对应的json文件

用例:Stream To Stream 

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "sliceRecordCount": 10,
            "column": [
              {
                "type": "long",
                "value": "10"
              },
              {
                "type": "string",
                "value": "hello,你好,世界-DataX"
              }
            ]
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "encoding": "UTF-8",
            "print": true
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 5
       }
    }
  }
}

执行结果

DataX: Ⅱ_第2张图片

MysqlReader To Stream 

通过命令python datax.py -r mysqlreader -w streamwriter 查看相关的模板为

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the mysqlreader document:
     https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md 

Please refer to the streamwriter document:
     https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md 
 
Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": [], 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "username": "", 
                        "where": ""
                    }
                }, 
                "writer": {
                    "name": "streamwriter", 
                    "parameter": {
                        "encoding": "", 
                        "print": true
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

然后编辑该json

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": ["Name","GroupName"], 
                        "connection": [
                            {
                                "jdbcUrl": ["jdbc:mysql://192.168.137.2:3306/test"], 
                                "table": ["employee"]
                            }
                        ], 
                        "password": "root", 
                        "username": "root"
                    }
                }, 
                "writer": {
                    "name": "streamwriter", 
                    "parameter": {
                        "encoding": "", 
                        "print": true
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "1"
            }
        }
    }
}

关于日志的问题

你可能感兴趣的:(Big,Data,Computing,ETL,DataX)