隶属于文章系列:大数据安全实战 https://www.jianshu.com/p/76627fd8399c
datax访问hadoop时,默认把hadoop当成没有配置安全的集群来使用。当hadoop配置了Kerberos,需要相应改动。
- 配置Kerberos密钥
- 解决数据传输问题
添加Kerberos密钥
- 创建principle
kadmin.local -q "addprinc -randkey testuser "
kadmin.local -q "ktadd -k /etc/hadoop/conf/testuser.keytab testuser"
- 需要在job的描述文件中加入
"haveKerberos": true,
"kerberosKeytabFilePath": "/etc/hadoop/conf/testuser.keytab",
"kerberosPrincipal": "[email protected]",
解决数据传输问题
pwd
/mnt/kbdsproject/dataxtest/datax
python bin/datax.py job/mysqlhdfsemp.json
执行时控制台报下面的错误:
org.apache.hadoop.ipc.RemoteException(java.io.IOException):
File /user/testyarn-nopwd/test__f9164cee_ff85_41e3_b22c_d4c3a38a2aee/emp__50a158cb_cea3_48b3_84c2_4eb8a879e804
could only be replicated to 0 nodes instead of minReplication (=1).
There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
但是直接在终端 hdfs dfs -put *** .
是正常的,8020端口正常,只在使用datax的时候报这样的错误。
有的地方说datanode没有启动,查看集群datanode都正常。
观察日志:namenode日志没有异常。
看datanode的日志tail -f hadoop-hadoop-datanode-vm10-247-24-49.ksc.com.log
,发现:
2018-04-11 10:15:14,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Failed to read expected SASL data transfer protection handshake from client at /10.247.24.53:29277.
Perhaps the client is running an older version of Hadoop which does not support SASL data transfer protection
猜测datax和hadoop版本的兼容问题。
Hadoop 2.6.0版本起,HDFS客户端与datanode间通信的身份认证可通过启用SASL来实现。
如果datanode侧开启了SASL,那么在datax的job描述文件里,加上下面这个配置应该就可以了:
"hadoopConfig": { "dfs.data.transfer.protection": "integrity" }
参考:
https://github.com/alibaba/DataX/issues/54
这样最终使用的配置文件是:
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["empno","name","job"],
"splitPk": "",
"connection": [
{
"jdbcUrl": ["jdbc:mysql://123.456.789.135:3306/db"],
"table": ["emp"]
}
],
"password": "password",
"username": "root",
"where": "1=1"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"defaultFS": "hdfs://123.456.789.135:8020",
"fileType": "text",
"path": "/user/testuser/test",
"fileName": "emp",
"hadoopConfig": { "dfs.data.transfer.protection": "integrity" },
"column": [{"name":"empno","type":"string"},{"name":"name","type":"string"},{"name":"job","type":"string"}],
"writeMode": "append",
"fieldDelimiter": "",
"haveKerberos": true,
"kerberosKeytabFilePath": "/etc/hadoop/conf/testuser.keytab",
"kerberosPrincipal": "[email protected]",
"compress": ""
}
}
}
],
"setting": {
"speed": {
"channel": "1",
"errorLimit": {
"record": 0,
"percentage":0.02
}
}
}
}
}