环境准备:CentOS7,hadoop-3.1.0,hive-3.0.0,,jdk1.8
默认情况下,hadoop-3.1.0和hive-3.0.0 基本实现伪分布式的部署成功。
安装准备Kerberos
1) yum install -y krb5-server krb5-lib krb5-workstation
krb5-server:kdc服务端 krb5-workstation:客户端
2) 修改主机名和域名
hostnamectl set-hostname adcer
vim /etc/hosts
127.0.0.1 adcer
::1 adcer
192.168.243.130 adcer
vim /etc/sysconfig/network
HOSTNAME=adcer
3) 配置/etc/krb5.conf
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
rdns = false
default_realm = HADOOP.COM
#缓存凭证格式,注释掉会默认在/tmp/目录下生成缓存凭证文件
#default_ccache_name = KEYRING:persistent:%{uid}
[realms]
#HADOOP.COM 域 ,可以有多个
HADOOP.COM = {
#kdc位置,密钥分发中心,负责颁发凭证和 admin_server位置
kdc = adcer
admin_server = adcer
}
[domain_realm]
.hadoop.com = HADOOP.COM
hadoop.com = HADOOP.COM
4) 配置/var/kerberos/krb5kdc/kdc.conf,若没有可以自行根据模板文件创建。
[kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88
[realms]
HADOOP.COM = {
#master_key_type = aes256-cts
acl_file = /var/kerberos/krb5kdc/kadm5.acl
dict_file = /usr/share/dict/words
admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
supported_enctypes = aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
}
5) 配置/var/kerberos/krb5kdc/kadm5.acl
*/[email protected] *
代表名称匹配*/[email protected],权限是*,代表全部权限。
6) 创建Kerberos数据库
kdb5_util create -r HADOOP.COM -s
可以使用-d指定数据库名字。如果指定数据库名已经存在,可以把/var/kerberos/krb5kdc/目录下关于改数据库名相关文件都删除掉。默认的数据库名字都是principal。
7)启动kerberos
systemctl start krb5kdc.service
systemctl start /kadmind.service
8) 测试Kerberos
在本机上可以直接使用kadmin.local 登陆。输入 ? 可以查相关命令
增加
addprinc -randkey hds/adcer
生成keytab
xst -norandkey -k /opt/keytabs/hds.keytab hds/adcer
验证
Kinit -kt /opt/keytabs/hds.keytab hds/adcer
查看klist
销毁 kdestroy
Hadoop keyberos 配置
1) 创建kerberos账户,需要生成hds/_HOST和HTTP/_HOST账号(_HOST为机器全局域名)
addprinc -randkey hds/adcer
addprinc -randkey HTTP/adcer
生成keytab 文件 ,为了方便就放同一个keytab 文件了。
xst -norandkey-k /opt/keytabs/hds.keytab hds/adcer HTTP/adcer
2) core-site.xml 配置
hadoop.security.authentication:kerberos
hadoop.security.authorization:true
hadoop.rpc.protection:authentication 在RPC上的数据加密
hadoop.security.auth_to_local:DEFAULT
hadoop.security.auth_to_local:指定的规则来映射kerberos principals到操作系统本机的用户账号,比如:默认的的规则映射principal :hds/[email protected]到系统用户hds。也可以自己定义映射规则,如:nn 、dn对应本地用户hdfs,nm、rm对应yarn用户
hadoop.security.auth_to_local
RULE:[2:$1@$0](nn/.*@.*REALM.TLD)s/.*/hdfs/
RULE:[2:$1@$0](dn/.*@.*REALM.TLD)s/.*/hdfs/
RULE:[2:$1@$0](nm/.*@.*REALM.TLD)s/.*/yarn/
RULE:[2:$1@$0](rm/.*@.*REALM.TLD)s/.*/yarn/
DEFAULT
由于我们这里采用的是默认的映射规则,需要在本地创建hds系统账号。
3) hdfs-site.xml 配置
dfs.block.access.token.enable:true
dfs.namenode.keytab.file:/opt/keytabs/hds.keytab
dfs.namenode.kerberos.principal:hds/[email protected]
dfs.namenode.kerberos.internal.spnego.principal:HTTP/[email protected]
dfs.secondary.namenode.keytab.file:/opt/keytabs/user.keytab
dfs.secondary.namenode.kerberos.principal:hds/[email protected]
dfs.secondary.namenode.kerberos.internal.spnego.principal:HTTP/[email protected]
dfs.datanode.data.dir.perm:700
dfs.datanode.address:0.0.0.0:10019
dfs.datanode.http.address:0.0.0.0:10022
dfs.datanode.kerberos.principal:hds/[email protected]
dfs.datanode.keytab.file:/opt/keytabs/hds.keytab
dfs.encrypt.data.transfer:false 块数据传输的数据加密
dfs.http.policy:HTTPS_ONLY
dfs.data.transfer.protection:integrity
dfs.web.authentication.kerberos.principal:hds/[email protected]
dfs.web.authentication.kerberos.keytab:/opt/keytabs/hds.keytab
dfs.journalnode.kerberos.principal:hds/[email protected]
dfs.journalnode.keytab.file:/opt/keytabs/hds.keytab
dfs.journalnode.kerberos.internal.spnego.principal:HTTP/[email protected]
4) DataNode
因为DataNode的数据传输协议没有使用Hadoop RPC框架,DataNodes必须使用被dfs.datanode.address和dfs.datanode.http.address指定的特权端口来认证他们自己。该认证是基于假设攻击者无法获取在DataNode主机上的root特权,这样做需要用jsvc去实现,在2.6版本开始起,SASL可以被使用来认证数据传输协议。这不再需要安全集群使用jsvc的用户启动DataNode并绑定特权接口通过root认证。要在数据传输协议上启用SASL,在hdfs-site.xml设置dfs.data.transfer.protection,为dfs.datanode.address设置一个免特权端口,端口号大于1024,设置dfs.http.policy to HTTPS_ONLY并保证HADOOP_SECURE_DN_USER环境变量设置为空
SASL验证:生成证书
生成自签名证书 作为RootCA使用
openssl req -new -x509 -keyout test_ca_key -out test_ca_cert -days 9999 -subj '/C=CN/ST=sichuang/L=chengdu/O=adcer/OU=rd/CN=adcer'
创建证书库
keytool -keystore keystore -alias localhost -validity 9999 -genkey -keyalg RSA -keysize 2048 -dname "CN=adcer', OU=rd, O=adcer, L=chengdu, ST=sichuang, C=cn"
添加RootCA证书到truststore.jks,信任该证书
keytool -keystore truststore.jks -alias CARoot -import -file test_ca_cert
生成证书请求文件
keytool -certreq -alias localhost -keystore keystore -file cert
RootCA签名证书请求文件
openssl x509 -req -CA test_ca_cert -CAkey test_ca_key -in cert -out cert_signed -days 9999 -CAcreateserial -passin pass:{passwd}
导入证书到keystore
keytool -keystore keystore -alias CARoot -import -file test_ca_cert
keytool -keystore keystore -alias localhost -import -file cert_signed
证书生成详见:https://blog.csdn.net/naioonai/article/details/81045780
配置ssl-client.xml和ssl-server.xml
ssl.server.truststore.location:/opt/modules/hadoop-3.1.0/truststore.jks
ssl.server.truststore.password:*
ssl.server.keystore.location:/opt/modules/hadoop-3.1.0/keystore
ssl.server.keystore.password:*
至此,start-dfs.sh 可以正常启动。可以通过kerberos认证操作
5) yarn配置
yarn-site.xml
yarn.resourcemanager.principal:hds/[email protected]
yarn.resourcemanager.keytab:/opt/keytabs/hds.keytab
yarn.nodemanager.principal:hds/[email protected]
yarn.nodemanager.keytab:/opt/keytabs/hds.keytab
yarn.nodemanager.container-executor.class: org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
yarn.nodemanager.linux-container-executor.group:hadoop
yarn.nodemanager.linux-container-executor.path:/opt/modules/hadoop-3.1.0/bin/container-executor
mapred-site.xml
mapreduce.application.classpath:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*
mapreduce.jobhistory.keytab:/opt/keytabs/hds.keytab
mapreduce.jobhistory.principal:hds/[email protected]
container-executor.cfg,container-executor
为了不再重新编译container-executor,直接修改container-executor和container-executor.cfg的相关权限
bin/container-executor文件:
Permission:为6050或者–Sr-s—,
Owner:root ,
Group:yarn.nodemanager.linux-container-executor.group指定的group
etc/hadoop/container-executor.cfg
要求该文件的所有父目录都属于root用户,且该文件权限小于755
配置该文件:
yarn.nodemanager.linux-container-executor.group=hadoop#configured value of yarn.nodemanager.linux-container-executor.group
banned.users=#comma separated list of users who can not run applications
min.user.id=500#Prevent other super-users 最小的userid的值
allowed.system.users=hds
feature.tc.enabled=false
至此:start-yarn.sh 可以正常启动。
Hive kerberos 配置
1) Hive-site.xml
hive.server2.authentication:KERBEROS
hive.metastore.kerberos.keytab.file:/opt/keytabs/hds.keytab
hive.metastore.kerberos.principal:hive/[email protected]
hive.metastore.sasl.enabled:true
hive.server2.authentication.kerberos.keytab:/opt/keytabs/hds.keytab
hive.server2.authentication.kerberos.principal:hive/[email protected]
2) core-site.xml 用户代理配置
hadoop.proxyuser.hive.hosts:adcer (ip或主机名)
hadoop.proxyuser.hive.groups:hadoop (对应到当前hdfs kerberos认证用户所属组)
或者
hadoop.proxyuser.hive.users:hds(对应到当前hdfs kerberos认证用户)
可以通过通配符*代表所有
如:
hadoop.proxyuser.hive.hosts:*
hadoop.proxyuser.hive.groups:*
hadoop.proxyuser.hive其中的hive代表hive的kerberos认证用户。
启动hive server2,通过beeline 连接
!connect jdbc:hive2://localhost:10000/default;principal=hive/[email protected]
链接principal与hive.server2.authentication.kerberos.principal 一致
参考文章:
https://blog.csdn.net/dxl342/article/details/55510659
https://www.linuxidc.com/Linux/2016-09/134948.htm
http://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-common/SecureMode.html
https://blog.csdn.net/a118170653/article/details/43448133
https://cwiki.apache.org/confluence/display/HIVE