关于 Kerberos 的安装和 HDFS 配置 kerberos 认证,请参考 HDFS配置kerberos认证。
关于 Kerberos 的安装和 YARN 配置 kerberos 认证,请参考 YARN配置kerberos认证。
关于 Kerberos 的安装和 Hive 配置 kerberos 认证,请参考 Hive配置kerberos认证。
请先完成 HDFS 、YARN、Hive 配置 Kerberos 认证,再来配置 Impala 集成 Kerberos 认证 !
参考 使用yum安装CDH Hadoop集群 安装 hadoop 集群,集群包括三个节点,每个节点的ip、主机名和部署的组件分配如下:
192.168.56.121 cdh1 NameNode、Hive、ResourceManager、HBase、impala-state-store、impala-catalog、Kerberos Server
192.168.56.122 cdh2 DataNode、SSNameNode、NodeManager、HBase、impala-server
192.168.56.123 cdh3 DataNode、HBase、NodeManager、impala-server
注意:hostname 请使用小写,要不然在集成 kerberos 时会出现一些错误。
在每个节点上运行下面的命令:
$ yum install python-devel openssl-devel python-pip cyrus-sasl cyrus-sasl-gssapi cyrus-sasl-devel -y
$ pip-python install ssl
在 cdh1 节点,即 KDC server 节点上执行下面命令:
$ cd /var/kerberos/krb5kdc/
kadmin.local -q "addprinc -randkey impala/[email protected] "
kadmin.local -q "addprinc -randkey impala/[email protected] "
kadmin.local -q "addprinc -randkey impala/[email protected] "
kadmin.local -q "xst -k impala-unmerge.keytab impala/[email protected] "
kadmin.local -q "xst -k impala-unmerge.keytab impala/[email protected] "
kadmin.local -q "xst -k impala-unmerge.keytab impala/[email protected] "
另外,如果你使用了haproxy来做负载均衡,参考官方文档Using Impala through a Proxy for High Availability,还需生成 proxy.keytab:
$ cd /var/kerberos/krb5kdc/
# proxy 为安装了 haproxy 的机器
kadmin.local -q "addprinc -randkey impala/[email protected] "
kadmin.local -q "xst -k proxy.keytab impala/[email protected] "
合并 proxy.keytab 和 impala-unmerge.keytab 生成 impala.keytab:
$ ktutil
ktutil: rkt proxy.keytab
ktutil: rkt impala-unmerge.keytab
ktutil: wkt impala.keytab
ktutil: quit
拷贝 impala.keytab 和 proxy_impala.keytab 文件到其他节点的 /etc/impala/conf 目录
$ scp impala.keytab cdh1:/etc/impala/conf
$ scp impala.keytab cdh2:/etc/impala/conf
$ scp impala.keytab cdh3:/etc/impala/conf
并设置权限,分别在 cdh1、cdh2、cdh3 上执行:
$ ssh cdh1 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"
$ ssh cdh2 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"
$ ssh cdh3 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"
由于 keytab 相当于有了永久凭证,不需要提供密码(如果修改 kdc 中的 principal 的密码,则该 keytab 就会失效),所以其他用户如果对该文件有读权限,就可以冒充 keytab 中指定的用户身份访问 hadoop,所以 keytab 文件需要确保只对 owner 有读权限(0400)
修改 cdh1 节点上的 /etc/default/impala,在 IMPALA_CATALOG_ARGS
、IMPALA_SERVER_ARGS
和 IMPALA_STATE_STORE_ARGS
中添加下面参数:
-kerberos_reinit_interval=60
-principal=impala/[email protected]
-keytab_file=/etc/impala/conf/impala.keytab
如果使用了 HAProxy(关于 HAProxy 的配置请参考 Hive使用HAProxy配置HA),则 IMPALA_SERVER_ARGS
参数需要修改为(proxy为 HAProxy 机器的名称,这里我是将 HAProxy 安装在 cdh1 节点上):
-kerberos_reinit_interval=60
-be_principal=impala/[email protected]
-principal=impala/[email protected]
-keytab_file=/etc/impala/conf/impala.keytab
在 IMPALA_CATALOG_ARGS
中添加:
-state_store_host=${IMPALA_STATE_STORE_HOST} \
将修改的上面文件同步到其他节点。最后,/etc/default/impala 文件如下,这里,为了避免 hostname 存在大写的情况,使用 hostname
变量替换 _HOST
:
IMPALA_CATALOG_SERVICE_HOST=cdh1
IMPALA_STATE_STORE_HOST=cdh1
IMPALA_STATE_STORE_PORT=24000
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=/var/log/impala
IMPALA_MEM_DEF=$(free -m |awk 'NR==2{print $2-5120}')
hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`
IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_host=${IMPALA_STATE_STORE_HOST} \
-kerberos_reinit_interval=60\
-principal=impala/${hostname}@JAVACHEN.COM \
-keytab_file=/etc/impala/conf/impala.keytab
"
IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}\
-statestore_subscriber_timeout_seconds=15 \
-kerberos_reinit_interval=60 \
-principal=impala/${hostname}@JAVACHEN.COM \
-keytab_file=/etc/impala/conf/impala.keytab
"
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} \
-kerberos_reinit_interval=60 \
-be_principal=impala/${hostname}@JAVACHEN.COM \
-principal=impala/[email protected] \
-keytab_file=/etc/impala/conf/impala.keytab \
-mem_limit=${IMPALA_MEM_DEF}m
"
ENABLE_CORE_DUMPS=false
将修改的上面文件同步到其他节点:cdh2、cdh3:
$ scp /etc/default/impala cdh2:/etc/default/impala
$ scp /etc/default/impala cdh3:/etc/default/impala
更新 impala 配置文件下的文件并同步到其他节点:
cp /etc/hadoop/conf/core-site.xml /etc/impala/conf/
cp /etc/hadoop/conf/hdfs-site.xml /etc/impala/conf/
cp /etc/hive/conf/hive-site.xml /etc/impala/conf/
scp -r /etc/impala/conf cdh2:/etc/impala
scp -r /etc/impala/conf cdh3:/etc/impala
impala-state-store 是通过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:
$ kinit -k -t /etc/impala/conf/impala.keytab impala/[email protected]
$ service impala-state-store start
然后查看日志,确认是否启动成功。
$ tailf /var/log/impala/statestored.INFO
impala-catalog 是通过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:
$ kinit -k -t /etc/impala/conf/impala.keytab impala/[email protected]
$ service impala-catalog start
然后查看日志,确认是否启动成功。
$ tailf /var/log/impala/catalogd.INFO
impala-server 是通过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:
$ kinit -k -t /etc/impala/conf/impala.keytab impala/[email protected]
$ service impala-server start
然后查看日志,确认是否启动成功。
$ tailf /var/log/impala/impalad.INFO
在启用了 kerberos 之后,运行 impala-shell 时,需要添加 -k
参数:
$ impala-shell -k
Starting Impala Shell using Kerberos authentication
Using service name 'impala'
Connected to cdh1:21000
Server version: impalad version 1.3.1-cdh4 RELEASE (build 907481bf45b248a7bb3bb077d54831a71f484e5f)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v1.3.1-cdh4 (907481b) built on Wed Apr 30 14:23:48 PDT 2014)
[cdh1:21000] >
[cdh1:21000] > show tables;
Query: show tables
+------+
| name |
+------+
| a |
| b |
| c |
| d |
+------+
Returned 4 row(s) in 0.08s
如果出现下面异常:
[cdh1:21000] > select * from test limit 10;
Query: select * from test limit 10
ERROR: AnalysisException: Failed to load metadata for table: default.test
CAUSED BY: TableLoadingException: Failed to load metadata for table: test
CAUSED BY: TTransportException: java.net.SocketTimeoutException: Read timed out
CAUSED BY: SocketTimeoutException: Read timed out
则需要在 hive-site.xml 中添加下面参数:
hive.metastore.client.socket.timeout
3600