datax集成Kerberos:访问配置Kerberos的集群

隶属于文章系列:大数据安全实战 https://www.jianshu.com/p/76627fd8399c


datax访问hadoop时,默认把hadoop当成没有配置安全的集群来使用。当hadoop配置了Kerberos,需要相应改动。

  • 配置Kerberos密钥
  • 解决数据传输问题

添加Kerberos密钥

  • 创建principle
kadmin.local -q "addprinc -randkey testuser "
kadmin.local -q "ktadd -k /etc/hadoop/conf/testuser.keytab  testuser"
  • 需要在job的描述文件中加入
"haveKerberos": true,
"kerberosKeytabFilePath": "/etc/hadoop/conf/testuser.keytab",
"kerberosPrincipal": "[email protected]",

解决数据传输问题

pwd
/mnt/kbdsproject/dataxtest/datax

python bin/datax.py job/mysqlhdfsemp.json

执行时控制台报下面的错误:

org.apache.hadoop.ipc.RemoteException(java.io.IOException):
 File /user/testyarn-nopwd/test__f9164cee_ff85_41e3_b22c_d4c3a38a2aee/emp__50a158cb_cea3_48b3_84c2_4eb8a879e804
could only be replicated to 0 nodes instead of minReplication (=1).
 There are 3 datanode(s) running and 3 node(s) are excluded in this operation.

但是直接在终端 hdfs dfs -put *** .是正常的,8020端口正常,只在使用datax的时候报这样的错误。
有的地方说datanode没有启动,查看集群datanode都正常。
观察日志:namenode日志没有异常。
看datanode的日志tail -f hadoop-hadoop-datanode-vm10-247-24-49.ksc.com.log,发现:

2018-04-11 10:15:14,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
 Failed to read expected SASL data transfer protection handshake from client at /10.247.24.53:29277.
 Perhaps the client is running an older version of Hadoop which does not support SASL data transfer protection

猜测datax和hadoop版本的兼容问题。

Hadoop 2.6.0版本起,HDFS客户端与datanode间通信的身份认证可通过启用SASL来实现。
如果datanode侧开启了SASL,那么在datax的job描述文件里,加上下面这个配置应该就可以了:
"hadoopConfig": { "dfs.data.transfer.protection": "integrity" }

参考:
https://github.com/alibaba/DataX/issues/54

这样最终使用的配置文件是:

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                                        "parameter": {
                        "column": ["empno","name","job"],
                         "splitPk": "",
                                                 "connection": [
                            {
                                "jdbcUrl": ["jdbc:mysql://123.456.789.135:3306/db"],
                                "table": ["emp"]
                            }
                        ],
                        "password": "password",
                        "username": "root",
                        "where": "1=1"
                    }
                },
                "writer": {
                      "name": "hdfswriter",
                      "parameter": {
                          "defaultFS": "hdfs://123.456.789.135:8020",
                          "fileType": "text",
                          "path": "/user/testuser/test",
                          "fileName": "emp",
                          "hadoopConfig": { "dfs.data.transfer.protection": "integrity" },
                          "column": [{"name":"empno","type":"string"},{"name":"name","type":"string"},{"name":"job","type":"string"}],
                          "writeMode": "append",
                          "fieldDelimiter": "",
                          "haveKerberos": true,
                          "kerberosKeytabFilePath": "/etc/hadoop/conf/testuser.keytab",
                          "kerberosPrincipal": "[email protected]",
                          "compress": ""
                       }
                 }
                        }
          ],
          "setting": {
              "speed": {
                  "channel": "1",
                  "errorLimit": {
                      "record": 0,
                      "percentage":0.02
                  }
              }
          }
     }
 }

你可能感兴趣的:(datax集成Kerberos:访问配置Kerberos的集群)