本文主要记录 CDH 5.2 Hadoop 集群中配置 Impala 和 Hive 集成 Sentry 的过程,包括 Sentry 的安装、配置以及和 Impala、Hive 集成后的测试。
使用 Sentry 来管理集群的权限,需要先在集群上配置好 Kerberos。
关于 Hadoop 集群上配置 kerberos 以及 ldap 的过程请参考本博客以下文章:
Sentry 会安装在三个节点的 hadoop 集群上,每个节点的ip、主机名和部署的组件分配如下:
192.168.56.121 cdh1 NameNode、Hive、ResourceManager、HBase、impala-state-store、impala-catalog、Kerberos Server、sentry-store
192.168.56.122 cdh2 DataNode、SSNameNode、NodeManager、HBase、impala-server
192.168.56.123 cdh3 DataNode、HBase、NodeManager、impala-server
Sentry 的使用有两种方式,一是基于文件的存储方式(SimpleFileProviderBackend),一是基于数据库的存储方式(SimpleDbProviderBackend),如果使用基于文件的存储则只需要安装sentry
,否则还需要安装 sentry-store
。
在 cdh1 节点上安装 sentry-store 服务:
yum install sentry sentry-store -y
修改 Sentry 的配置文件 /etc/sentry/conf/sentry-store-site.xml
,下面的配置参考了 Sentry源码中的配置例子:
sentry.service.admin.group
impala,hive,hue
sentry.service.allow.connect
impala,hive,hue
sentry.verify.schema.version
true
sentry.service.server.rpc-address
cdh1
sentry.service.server.rpc-port
8038
sentry.store.jdbc.url
jdbc:postgresql://cdh1/sentry
sentry.store.jdbc.driver
org.postgresql.Driver
sentry.store.jdbc.user
sentry
sentry.store.jdbc.password
redhat
sentry.hive.server
server1
sentry.store.group.mapping
org.apache.sentry.provider.common.HadoopGroupMappingService
创建数据库,请参考 Hadoop自动化安装shell脚本:
yum install postgresql-server postgresql-jdbc -y
ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/hive/lib/postgresql-jdbc.jar
ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/sentry/lib/postgresql-jdbc.jar
su -c "cd ; /usr/bin/pg_ctl start -w -m fast -D /var/lib/pgsql/data" postgres
su -c "cd ; /usr/bin/psql --command \"create user sentry with password 'redhat'; \" " postgres
su -c "cd ; /usr/bin/psql --command \"CREATE DATABASE sentry owner=sentry;\" " postgres
su -c "cd ; /usr/bin/psql --command \"GRANT ALL privileges ON DATABASE sentry TO sentry;\" " postgres
su -c "cd ; /usr/bin/psql -U sentry -d sentry -f /usr/lib/sentry/scripts/sentrystore/upgrade/sentry-postgres-1.4.0-cdh5.sql" postgres
su -c "cd ; /usr/bin/pg_ctl restart -w -m fast -D /var/lib/pgsql/data" postgres
/var/lib/pgsql/data/pg_hba.conf 内容如下:
# TYPE DATABASE USER CIDR-ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all md5
# IPv4 local connections:
#host all all 0.0.0.0/0 trust
host all all 127.0.0.1/32 md5
# IPv6 local connections:
#host all all ::1/128 nd5
如果集群开启了 Kerberos 验证,则需要在该节点上生成 Sentry 服务的 principal 并导出为 ticket:
$ cd /etc/sentry/conf
kadmin.local -q "addprinc -randkey sentry/[email protected] "
kadmin.local -q "xst -k sentry.keytab sentry/[email protected] "
chown sentry:hadoop sentry.keytab ; chmod 400 *.keytab
然后,在/etc/sentry/conf/sentry-store-site.xml 中添加如下内容:
sentry.service.security.mode
kerberos
sentry.service.server.principal
sentry/[email protected]
sentry.service.server.keytab
/etc/sentry/conf/sentry.keytab
参考 Securing Impala for analysts,准备测试数据:
$ cat /tmp/events.csv
10.1.2.3,US,android,createNote
10.200.88.99,FR,windows,updateNote
10.1.2.3,US,android,updateNote
10.200.88.77,FR,ios,createNote
10.1.4.5,US,windows,updateTag
$ hive -S
hive> create database sensitive;
hive> create table sensitive.events (
ip STRING, country STRING, client STRING, action STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
hive> load data local inpath '/tmp/events.csv' overwrite into table sensitive.events;
hive> create database filtered;
hive> create view filtered.events as select country, client, action from sensitive.events;
hive> create view filtered.events_usonly as
select * from filtered.events where country = 'US';
在使用 Sentry 时,有如下要求:
1、需要修改 /user/hive/warehouse
权限:
hdfs dfs -chmod -R 770 /user/hive/warehouse
hdfs dfs -chown -R hive:hive /user/hive/warehouse
2、修改 hive-site.xml 文件,关掉 HiveServer2 impersonation
3、taskcontroller.cfg 文件中确保 min.user.id=0
。
修改 hive-site.xml,添加如下:
hive.security.authorization.task.factory
org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl
hive.server2.session.hook
org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook
hive.sentry.conf.url
file:///etc/hive/conf/sentry-site.xml
在 /etc/hive/conf/ 目录创建 sentry-site.xml:
sentry.service.client.server.rpc-port
8038
sentry.service.client.server.rpc-address
cdh1
sentry.service.client.server.rpc-connection-timeout
200000
sentry.service.security.mode
kerberos
sentry.service.server.principal
sentry/[email protected]
sentry.service.server.keytab
/etc/sentry/conf/sentry.keytab
sentry.hive.provider
org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider
sentry.hive.provider.backend
org.apache.sentry.provider.db.SimpleDBProviderBackend
sentry.hive.server
server1
sentry.metastore.service.users
hive
sentry.hive.testing.mode
false
在 beeline 中通过 hive(注意,在 sentry 中 hive 为管理员用户)的 ticket 连接 hive-server2,创建 role、group 等等,执行下面语句:
create role admin_role;
GRANT ALL ON SERVER server1 TO ROLE admin_role;
GRANT ROLE admin_role TO GROUP admin;
GRANT ROLE admin_role TO GROUP hive;
create role test_role;
GRANT ALL ON DATABASE filtered TO ROLE test_role;
GRANT ALL ON DATABASE sensitive TO ROLE test_role;
GRANT ROLE test_role TO GROUP test;
上面创建了两个角色,一个是 admin_role,具有管理员权限,可以读写所有数据库,并授权给 admin 和 hive 组(对应操作系统上的组);一个是 test_role,只能读写 filtered 和 sensitive 数据库,并授权给 test 组
在 ldap 服务器上创建系统用户 yy_test,并使用 migrationtools 工具将该用户导入 ldap,最后设置 ldap 中该用户密码。
# 创建 yy_test用户
useradd yy_test
grep -E "yy_test" /etc/passwd >/opt/passwd.txt
/usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif
ldapadd -x -D "uid=ldapadmin,ou=people,dc=lashou,dc=com" -w secret -f /opt/passwd.ldif
#使用下面语句修改密码,填入上面生成的密码,输入两次:
ldappasswd -x -D 'uid=ldapadmin,ou=people,dc=lashou,dc=com' -w secret "uid=yy_test,ou=people,dc=lashou,dc=com" -S
在每台 datanode 机器上创建 test 分组,并将 yy_test 用户加入到 test 分组:
groupadd test ; useradd yy_test; usermod -G test,yy_test yy_test
通过 beeline 连接 hive-server2,进行测试:
# 切换到 test 用户进行测试
$ su test
$ kinit -k -t test.keytab test/[email protected]
$ beeline -u "jdbc:hive2://cdh1:10000/default;principal=test/[email protected]"
修改 /etc/default/impala 文件中的 IMPALA_SERVER_ARGS
参数,添加:
-server_name=server1
-sentry_config=/etc/impala/conf/sentry-site.xml
在 IMPALA_CATALOG_ARGS
中添加:
-sentry_config=/etc/impala/conf/sentry-site.xml
注意:server1 必须和 sentry-provider.ini 文件中的保持一致。
IMPALA_SERVER_ARGS
参数最后如下:
hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-kerberos_reinit_interval=60 \
-principal=impala/${hostname}@JAVACHEN.COM \
-keytab_file=/etc/impala/conf/impala.keytab \
-enable_ldap_auth=true -ldap_uri=ldaps://cdh1 -ldap_baseDN=ou=people,dc=javachen,dc=com \
-server_name=server1 \
-sentry_config=/etc/impala/conf/sentry-site.xml \
-be_port=${IMPALA_BACKEND_PORT} -default_pool_max_requests=-1 -mem_limit=60%"
创建 /etc/impala/conf/sentry-site.xml 内容如下:
sentry.service.client.server.rpc-port
8038
sentry.service.client.server.rpc-address
cdh1
sentry.service.client.server.rpc-connection-timeout
200000
sentry.service.security.mode
kerberos
sentry.service.server.principal
sentry/[email protected]
sentry.service.server.keytab
/etc/sentry/conf/sentry.keytab
请参考下午基于文件存储方式中 impala 的测试。
在 hive 的 /etc/hive/conf 目录下创建 sentry-site.xml 文件,内容如下:
hive.sentry.server
server1
sentry.hive.provider.backend
org.apache.sentry.provider.file.SimpleFileProviderBackend
hive.sentry.provider
org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider
hive.sentry.provider.resource
/user/hive/sentry/sentry-provider.ini
创建 sentry-provider.ini 文件并将其上传到 hdfs 的 /user/hive/sentry/
目录:
$ cat /tmp/sentry-provider.ini
[databases]
# Defines the location of the per DB policy file for the customers DB/schema
#db1 = hdfs://cdh1:8020/user/hive/sentry/db1.ini
[groups]
admin = any_operation
hive = any_operation
test = select_filtered
[roles]
any_operation = server=server1->db=*->table=*->action=*
select_filtered = server=server1->db=filtered->table=*->action=SELECT
select_us = server=server1->db=filtered->table=events_usonly->action=SELECT
[users]
test = test
hive= hive
$ hdfs dfs -rm -r /user/hive/sentry/sentry-provider.ini
$ hdfs dfs -put /tmp/sentry-provider.ini /user/hive/sentry/
$ hdfs dfs -chown hive:hive /user/hive/sentry/sentry-provider.ini
$ hdfs dfs -chmod 640 /user/hive/sentry/sentry-provider.ini
关于 sentry-provider.ini 文件的语法说明,请参考官方文档。这里我指定了 Hive 组有全部权限,并指定 Hive 用户属于 Hive 分组,而其他两个分组只有部分权限。
然后在 hive-site.xml 中添加如下配置:
hive.security.authorization.task.factory
org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl
hive.server2.session.hook
org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook
hive.sentry.conf.url
file:///etc/hive/conf/sentry-site.xml
将配置文件同步到其他节点,并重启 hive-server2 服务。
这里,我集群中 hive-server2 开启了 kerberos 认证,故通过 hive 用户来连接 hive-server2。
$ kinit -k -t /etc/hive/conf/hive.keytab hive/[email protected]
$ beeline -u "jdbc:hive2://cdh1:10000/default;principal=hive/[email protected]"
scan complete in 10ms
Connecting to jdbc:hive2://cdh1:10000/default;principal=hive/[email protected]
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
5 rows selected (0.339 seconds)
0: jdbc:hive2://cdh1:10000/default> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
| filtered |
| sensitive |
+----------------+--+
10 rows selected (0.145 seconds)
0: jdbc:hive2://cdh1:10000/default> use filtered
No rows affected (0.132 seconds)
0: jdbc:hive2://cdh1:10000/default> show tables;
+----------------+--+
| tab_name |
+----------------+--+
| events |
| events_usonly |
+----------------+--+
2 rows selected (0.158 seconds)
0: jdbc:hive2://cdh1:10000/default> use sensitive;
No rows affected (0.115 seconds)
0: jdbc:hive2://cdh1:10000/default> show tables;
+-----------+--+
| tab_name |
+-----------+--+
| events |
+-----------+--+
1 row selected (0.148 seconds)
修改 /etc/default/impala 文件中的 IMPALA_SERVER_ARGS
参数,添加:
-server_name=server1
-authorization_policy_file=/user/hive/sentry/sentry-provider.ini
-authorization_policy_provider_class=org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider
注意:server1 必须和 sentry-provider.ini 文件中的保持一致。
IMPALA_SERVER_ARGS
参数最后如下:
hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} \
-server_name=server1 \
-authorization_policy_file=/user/hive/sentry/sentry-provider.ini \
-authorization_policy_provider_class=org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider \
-enable_ldap_auth=true -ldap_uri=ldaps://cdh1 -ldap_baseDN=ou=people,dc=javachen,dc=com \
-kerberos_reinit_interval=60 \
-principal=impala/${hostname}@JAVACHEN.COM \
-keytab_file=/etc/impala/conf/impala.keytab \
"
重启 impala-server 服务,然后进行测试。因为我这里 impala-server 集成了 kerberos 和 ldap,现在通过 ldap 来进行测试。
先通过 ldap 的 test 用户来测试:
impala-shell -l -u test
Starting Impala Shell using LDAP-based authentication
LDAP password for test:
Connected to cdh1:21000
Server version: impalad version 2.0.0-cdh5 RELEASE (build ecf30af0b4d6e56ea80297df2189367ada6b7da7)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v2.0.0-cdh5 (ecf30af) built on Sat Oct 11 13:56:06 PDT 2014)
[cdh1:21000] > show databases;
Query: show databases
+---------+
| name |
+---------+
| default |
+---------+
Fetched 1 row(s) in 0.11s
[cdh1:21000] > show tables;
Query: show tables
ERROR: AuthorizationException: User 'test' does not have privileges to access: default.*
[cdh1:21000] >
可以看到 test 用户没有权限查看和数据库,这是因为 sentry-provider.ini 文件中并没有给 test 用户分配任何权限。
下面使用 hive 用户来测试。使用下面命令在 ldap 中创建 hive 用户和组并给 hive 用户设置密码。
$ grep hive /etc/passwd >/opt/passwd.txt
$ /usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif
$ ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/passwd.ldif
$ grep hive /etc/group >/opt/group.txt
$ /usr/share/migrationtools/migrate_group.pl /opt/group.txt /opt/group.ldif
$ ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/group.ldif
# 修改 ldap 中 hive 用户密码
$ ldappasswd -x -D 'uid=ldapadmin,ou=people,dc=javachen,dc=com' -w secret "uid=hive,ou=people,dc=javachen,dc=com" -S
然后,使用 hive 用户测试:
$ impala-shell -l -u hive
Starting Impala Shell using LDAP-based authentication
LDAP password for hive:
Connected to cdh1:21000
Server version: impalad version 2.0.0-cdh5 RELEASE (build ecf30af0b4d6e56ea80297df2189367ada6b7da7)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v2.0.0-cdh5 (ecf30af) built on Sat Oct 11 13:56:06 PDT 2014)
[cdh1:21000] > show databases;
Query: show databases
+------------------+
| name |
+------------------+
| _impala_builtins |
| default |
| filtered |
| sensitive |
+------------------+
Fetched 11 row(s) in 0.11s
[cdh1:21000] > use sensitive;
Query: use sensitive
[cdh1:21000] > show tables;
Query: show tables
+--------+
| name |
+--------+
| events |
+--------+
Fetched 1 row(s) in 0.11s
[cdh1:21000] > select * from events;
Query: select * from events
+--------------+---------+---------+------------+
| ip | country | client | action |
+--------------+---------+---------+------------+
| 10.1.2.3 | US | android | createNote |
| 10.200.88.99 | FR | windows | updateNote |
| 10.1.2.3 | US | android | updateNote |
| 10.200.88.77 | FR | ios | createNote |
| 10.1.4.5 | US | windows | updateTag |
+--------------+---------+---------+------------+
Fetched 5 row(s) in 0.76s
同样,你还可以使用其他用户来测试。
也可以使用 beeline 来连接 impala-server 来进行测试:
$ beeline -u "jdbc:hive2://cdh1:21050/default;" -n test -p test
scan complete in 2ms
Connecting to jdbc:hive2://cdh1:21050/default;
Connected to: Impala (version 2.0.0-cdh5)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://cdh1:21050/default>