DataX KafkaWriter 插件文档

DataX KafkaWriter 插件文档

最近学习使用datax工具, 发现阿里官方提供并没有kafkawriter插件,于是自己写了一个
该插件主要借鉴:datax插件开发宝典
然后在此基础上改造

源码:https://gitee.com/mjlfto/dataX/tree/master/kafkawriter

1 快速介绍

KakfaWriter提供向kafka中指定topic写数据。

2 功能与限制

目前kafkaWriter支持向单个topic中写入文本类型数据或者json格式数据

3 功能说明

3.1 配置样例

{  
   "job":{  
      "setting":{  
         "speed":{  
            "channel":1
         }
      },
      "content":[  
         {  
            "reader":{  
               "name":"oraclereader",
               "parameter":{  
                  "username":"zkcj",
                  "password":"zkcj2018",
                  "connection":[  
                     {  
                        "jdbcUrl":[  
                           "jdbc:oracle:thin:@10.1.20.169:1521:GYJG"
                        ],
                        "querySql":[  
                           "select * from VM_DRV_PREASIGN_A"
                        ]
                     }
                  ]
               }
            },
            "writer":{ 
		"name": "kafkawriter",
			  "parameter": {
			    "topic": "test-topic",
			    "bootstrapServers": "10.1.20.150:9092",
			    "fieldDelimiter":"\t",
			    "batchSize":10,
			     "writeType":"json",
			    "notTopicCreate":true,
			    "topicNumPartition":1,
			    "topicReplicationFactor":1
			  }
		 }
         }
      ]
   }
}

3.2 参数说明

  • bootstrapServers

    • 描述:kafka服务地址,格式:host1:port,host2:port 样例:10.1.20.111:9092,10.1.20.121:9092

    • 必选:是

    • 默认值:无

  • topic

    • 描述:kafka Topic 名称, 目前支持一次写入单个topic

    • 必选:是

    • 默认值:无

  • ack

    • 描述:消息的确认机制,默认值是0

      acks=0:如果设置为0,生产者不会等待kafka的响应。
      acks=1:这个配置意味着kafka会把这条消息写到本地日志文件中,但是不会等待集群中其他机器的成功响应。
      acks=all:这个配置意味着leader会等待所有的follower同步完成。这个确保消息不会丢失,除非kafka集群中所有机器挂掉。这是最强的可用性保证。

    • 必选:否

    • 默认值:0

  • batchSize

    • 描述:当多条消息需要发送到同一个分区时,生产者会尝试合并网络请求。这会提高client和生产者的效率

      默认值:16384

    • 必选:否

    • 默认值:16384

  • retries

    • 描述:配置为大于0的值的话,客户端会在消息发送失败时重新发送:

      默认值:0

    • 必选:否

    • 默认值:0

  • fieldDelimiter

    • 描述:当wirteType为text时,写入时的字段分隔符

      默认值:,(逗号)

    • 必选:否

    • 默认值:,

  • keySerializer

    • 描述:键序列化,默认org.apache.kafka.common.serialization.StringSerializer

    • 必选:否

    • 默认值:org.apache.kafka.common.serialization.StringSerializer

  • valueSerializer

    • 描述:键序列化,默认org.apache.kafka.common.serialization.StringSerializer

    • 必选:否

    • 默认值:org.apache.kafka.common.serialization.StringSerializer

  • noTopicCreate

    • 描述:当没有topic时,是否创建topic,默认false

    • 必选:haveKerberos 为true必选

    • 默认值:false

  • topicNumPartition

    • 描述:topic Partition 数量

    • 必选:否

    • 默认值:1

  • topicReplicationFactor

    • 描述:topic replication 数量

    • 必选:否

    • 默认值:1

  • writeType

    • 描述:写入到kafka中的数据格式,可选text, json

      text:使用fieldDelimiter拼接所有字段值作为key,value相同,然后写到kafka
      json:key和text格式相同,使用fieldDelimiter拼接所有字段值作为key,value使用datx内部column格式, 如下
      rawData为数据值,如果对象中没有该字段, 表示该值为null

        {  
           "data":[  
              {  
                 "byteSize":13,
                 "rawData":"xxxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"1",
                 "type":"STRING"
              },
              {  
                 "byteSize":12,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"A",
                 "type":"STRING"
              },
              {  
                 "byteSize":18,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":3,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"A",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":8,
                 "rawData":1426740491000,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":8,
                 "rawData":1426780800000,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":1,
                 "rawData":"E",
                 "type":"STRING"
              },
              {  
                 "byteSize":7,
                 "rawData":"5201009",
                 "type":"STRING"
              },
              {  
                 "byteSize":6,
                 "rawData":"520101",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":3,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":12,
                 "rawData":"520181000400",
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":0,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":78,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"STRING"
              },
              {  
                 "byteSize":8,
                 "rawData":1426694400000,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":12,
                 "rawData":"520181000400",
                 "type":"STRING"
              },
              {  
                 "byteSize":1,
                 "rawData":"1",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":1,
                 "rawData":"0",
                 "type":"DOUBLE"
              },
              {  
                 "byteSize":8,
                 "rawData":1426740491000,
                 "subType":"DATETIME",
                 "type":"DATE"
              },
              {  
                 "byteSize":2,
                 "rawData":"xxx",
                 "type":"STRING"
              },
              {  
                 "byteSize":0,
                 "type":"STRING"
              },
              {  
                 "byteSize":28,
                 "rawData":"YxIC7zeM6xG+eBdzxV4oRDxHses=",
                 "type":"STRING"
              }
           ],
           "size":40
        }
      
    • 必选:否

    • 默认值:text

3.3 类型转换

目前 HdfsWriter 支持大部分 Hive 类型,请注意检查你的类型。

下面列出 HdfsWriter 针对 Hive 数据类型转换列表:

DataX 内部类型 HIVE 数据类型
Long TINYINT,SMALLINT,INT,BIGINT
Double FLOAT,DOUBLE
String STRING,VARCHAR,CHAR
Boolean BOOLEAN
Date DATE,TIMESTAMP

4 配置步骤

5 约束限制

6 FAQ

你可能感兴趣的:(datax)