【DataX 增量同步】Mysql 同步数据到 es(Elasticsearch)

简介: 阿里云开源离线同步工具DataX3.0介绍 一. DataX3.0概览​ DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。

如果不熟悉的话可以先进行了解:https://developer.aliyun.com/article/59373

源码开源地址:https://github.com/alibaba/DataX?spm=a2c6h.12873639.0.0.21084f64hM6IE9

DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图

类型 数据源 Reader(读) Writer(写) 文档
RDBMS 关系型数据库 MySQL 读 、写
            Oracle         √         √     读 、写
  SQLServer 读 、写
  PostgreSQL 读 、写
  DRDS 读 、写
  通用RDBMS(支持所有关系型数据库) 读 、写
阿里云数仓数据存储 ODPS 读 、写
  ADS  
  OSS 读 、写
  OCS 读 、写
NoSQL数据存储 OTS 读 、写
  Hbase0.94 读 、写
  Hbase1.1 读 、写
  Phoenix4.x 读 、写
  Phoenix5.x 读 、写
  MongoDB 读 、写
  Hive 读 、写
  Cassandra 读 、写
无结构化数据存储 TxtFile 读 、写
  FTP 读 、写
  HDFS 读 、写
  Elasticsearch  
时间序列数据库 OpenTSDB  
  TSDB 读 、写

1、mysql2es脚本

test.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 2
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "datax",
            "password": "123456",
            "where":"updated_at>='${start_time} 00:00:00' and updated_at<='${end_time} 23:59:59'",
            "column": [
              "id",
              "app_id",        
              "collection_phone",
              "transaction_number",
              "pay_amount",             
              "if(auto_tags is null,'',replace(replace(replace(auto_tags,'[',''),']',''),'\"','')) as auto_tags",
              "if(manual_tags is null,'',replace(replace(replace(manual_tags,'[',''),']',''),'\"','')) as manual_tags",
              "if(latest_days_ordered_at is null,'',replace(replace(latest_days_ordered_at,'[',''),']','')) as latest_days_ordered_at",
              "if(latest_days_paid_at is null,'',replace(replace(latest_days_paid_at,'[',''),']','')) as latest_days_paid_at",
              "if(latest_days_visited_at is null,'',replace(replace(latest_days_visited_at,'[',''),']','')) as latest_days_visited_at",
              "latest_ordered_at",            
              "visited_products",
              "ordered_products"
            ],
            "connection": [
              {
                "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/db_user?com.mysql.jdbc.faultInjection.serverCharsetIndex=45"],
                "table": [
                  "user"
                ]
              }
            ]
          }
        },
        "writer": {
          "name": "elasticsearchwriter",
          "parameter": {
            "endpoint": "http://127.0.0.1:9200",
            "accessId": "elastic",
            "accessKey": "123456",
            "index":"user",
            "type":"traces",
            "settings": {"index" :{"number_of_shards": 5, "number_of_replicas": 1}},
            "batchSize": 5000,
            "splitter": ",",
            "column": [
              {"name":"pk","type":"id"},
              {"name":"app_id","type":"keyword"},            
              {"name":"collection_phone","type":"keyword"},
              {"name":"transaction_number","type":"integer"},
              {"name":"pay_amount","type":"integer"},
              {"name":"auto_tags","type":"keyword","array":true},
              {"name":"manual_tags","type":"keyword","array":true},
              {"name":"latest_days_ordered_at","type":"long","array":true},
              {"name":"latest_days_paid_at","type":"long","array":true},
              {"name":"latest_days_visited_at","type":"long","array":true},
              {"name":"latest_ordered_at","type":"long"},           
              {"name":"visited_products","type":"nested"},
              {"name":"ordered_products","type":"nested"}
            ]
          }
        }
      }
    ]
  }
}

2、运行datax脚本

python /usr/local/datax/bin/datax.py ./test.json -p "-Dstart_time=2020-09-02 -Dend_time=2020-09-02"

2.1  插件[mysqlreader,elasticsearchwriter]加载失败

运行完直接报错了,报错如下:

【DataX 增量同步】Mysql 同步数据到 es(Elasticsearch)_第1张图片

2020-09-02 15:49:33.747 [main] WARN  ConfigParser - 插件[mysqlreader,elasticsearchwriter]加载失败,1s后重试... Exception:Code:[Framework-12], Description:[DataX插件初始化错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 .].  - 插件加载失败,未完成指定插件加载:[elasticsearchwriter, mysqlreader]
2020-09-02 15:49:34.765 [main] ERROR Engine -

经DataX智能分析,该任务最可能的错误原因是:
com.alibaba.datax.common.exception.DataXException: Code:[Framework-12], Description:[DataX插件初始化错误, 该问题通常是由于DataX安装错误引起,请联系您的运维解决 .].  - 插件加载失败,未完成指定插件加载:[elasticsearchwriter, mysqlreader]
        at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26)
        at com.alibaba.datax.core.util.ConfigParser.parsePluginConfig(ConfigParser.java:142)
        at com.alibaba.datax.core.util.ConfigParser.parse(ConfigParser.java:63)
        at com.alibaba.datax.core.Engine.entry(Engine.java:137)
        at com.alibaba.datax.core.Engine.main(Engine.java:204)

 2.2 检查是否装有mysqlreder,elasticsearchwriter插件

  那既然说加载不成功,那我们就去看吗,拿数据说话

  mysqlreder已存在!!

  【DataX 增量同步】Mysql 同步数据到 es(Elasticsearch)_第2张图片

  哦豁,好像真的没有 elasticsearchwriter,小点声马上去安装。。。

 【DataX 增量同步】Mysql 同步数据到 es(Elasticsearch)_第3张图片

3、安装elasticsearchwriter组件(没装过插件的小朋友,装过的可以直接跳过)

  3.1  拉取DataX项目源码到服务器 DataX-master

  3.2  修改根目录下的pom.xml文件,按需修改

//原始的里面是所有很全的,不过一般都是按需install

        common
        core
        transformer

        
        mysqlreader
        drdsreader
        sqlserverreader
        postgresqlreader
        oraclereader
        odpsreader
        otsreader
        otsstreamreader
        txtfilereader
        hdfsreader
        streamreader
        ossreader
        ftpreader
        mongodbreader
        rdbmsreader
        hbase11xreader
        hbase094xreader
        tsdbreader
        opentsdbreader
        cassandrareader
        gdbreader

        
        mysqlwriter
        drdswriter
        odpswriter
        txtfilewriter
        ftpwriter
        hdfswriter
        streamwriter
        otswriter
        oraclewriter
        sqlserverwriter
        postgresqlwriter
        osswriter
        mongodbwriter
        adswriter
        ocswriter
        rdbmswriter
        hbase11xwriter
        hbase094xwriter
        hbase11xsqlwriter
        hbase11xsqlreader
        elasticsearchwriter
        tsdbwriter
        adbpgwriter
        gdbwriter
        cassandrawriter
        clickhousewriter
        
        plugin-rdbms-util
        plugin-unstructured-storage-util
        hbase20xsqlreader
        hbase20xsqlwriter
    

  修改后:

//原始的里面是所有很全的,不过一般都是按需install

        common
        core
        transformer

        
        mysqlreader
        

        
       
        elasticsearchwriter
        

        
        plugin-rdbms-util
        plugin-unstructured-storage-util
        hbase20xsqlreader
        hbase20xsqlwriter
    

  3.3 编译生成elasticsearchwriter 插件

mvn clean install -Dmaven.test.skip=true

  3.4 复制生成的文件到 /datax/plugin/,注意区分reader 跟writer

cp -r /usr/local/DataX-master/elasticsearchwriter/target/datax/plugin/writer/elasticsearchwriter /usr/local/data/datax/datax/plugin/writer

4、重新运行datax 命令,成功!!!

python /usr/local/datax/bin/datax.py ./test.json -p "-Dstart_time=2020-09-02 -Dend_time=2020-09-02"

【DataX 增量同步】Mysql 同步数据到 es(Elasticsearch)_第4张图片

5、增量的标准是以时间为准 !!!

【DataX 增量同步】Mysql 同步数据到 es(Elasticsearch)_第5张图片

 

你可能感兴趣的:(DataX,datax,mysql,elasticsearch)