Impala和Hive集成Sentry

本文主要记录 CDH 5.2 Hadoop 集群中配置 Impala 和 Hive 集成 Sentry 的过程,包括 Sentry 的安装、配置以及和 Impala、Hive 集成后的测试。

使用 Sentry 来管理集群的权限,需要先在集群上配置好 Kerberos。

关于 Hadoop 集群上配置 kerberos 以及 ldap 的过程请参考本博客以下文章:

  • HDFS配置Kerberos认证
  • YARN配置Kerberos认证
  • Hive配置Kerberos认证
  • Impala配置Kerberos认证
  • Hadoop配置LDAP集成Kerberos

Sentry 会安装在三个节点的 hadoop 集群上,每个节点的ip、主机名和部署的组件分配如下:

192.168.56.121        cdh1     NameNode、Hive、ResourceManager、HBase、impala-state-store、impala-catalog、Kerberos Server、sentry-store
192.168.56.122        cdh2     DataNode、SSNameNode、NodeManager、HBase、impala-server
192.168.56.123        cdh3     DataNode、HBase、NodeManager、impala-server

Sentry 的使用有两种方式,一是基于文件的存储方式(SimpleFileProviderBackend),一是基于数据库的存储方式(SimpleDbProviderBackend),如果使用基于文件的存储则只需要安装sentry,否则还需要安装 sentry-store

1. 基于数据库的存储方式

1.1 安装服务

在 cdh1 节点上安装 sentry-store 服务:

yum install sentry sentry-store -y

修改 Sentry 的配置文件 /etc/sentry/conf/sentry-store-site.xml,下面的配置参考了 Sentry源码中的配置例子:



    
        sentry.service.admin.group
        impala,hive,hue
    
    
        sentry.service.allow.connect
        impala,hive,hue
    
    
        sentry.verify.schema.version
        true
    
    
    sentry.service.server.rpc-address
    cdh1
    
    
    sentry.service.server.rpc-port
    8038
    
    
        sentry.store.jdbc.url
        jdbc:postgresql://cdh1/sentry
    
    
        sentry.store.jdbc.driver
        org.postgresql.Driver
    
    
        sentry.store.jdbc.user
        sentry
    
    
        sentry.store.jdbc.password
        redhat
    
    
        sentry.hive.server
        server1
    
    
        sentry.store.group.mapping
        org.apache.sentry.provider.common.HadoopGroupMappingService
    

创建数据库,请参考 Hadoop自动化安装shell脚本:

yum install postgresql-server postgresql-jdbc -y

ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/hive/lib/postgresql-jdbc.jar
ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/sentry/lib/postgresql-jdbc.jar

su -c "cd ; /usr/bin/pg_ctl start -w -m fast -D /var/lib/pgsql/data" postgres
su -c "cd ; /usr/bin/psql --command \"create user sentry with password 'redhat'; \" " postgres
su -c "cd ; /usr/bin/psql --command \"CREATE DATABASE sentry owner=sentry;\" " postgres
su -c "cd ; /usr/bin/psql --command \"GRANT ALL privileges ON DATABASE sentry TO sentry;\" " postgres
su -c "cd ; /usr/bin/psql -U sentry -d sentry -f /usr/lib/sentry/scripts/sentrystore/upgrade/sentry-postgres-1.4.0-cdh5.sql" postgres
su -c "cd ; /usr/bin/pg_ctl restart -w -m fast -D /var/lib/pgsql/data" postgres

/var/lib/pgsql/data/pg_hba.conf 内容如下:

# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD

# "local" is for Unix domain socket connections only
local   all         all                               md5
# IPv4 local connections:
#host    all         all         0.0.0.0/0             trust
host    all         all         127.0.0.1/32          md5

# IPv6 local connections:
#host    all         all         ::1/128               nd5

如果集群开启了 Kerberos 验证,则需要在该节点上生成 Sentry 服务的 principal 并导出为 ticket:

$ cd /etc/sentry/conf

kadmin.local -q "addprinc -randkey sentry/[email protected] "
kadmin.local -q "xst -k sentry.keytab sentry/[email protected] "

chown sentry:hadoop sentry.keytab ; chmod 400 *.keytab

然后,在/etc/sentry/conf/sentry-store-site.xml 中添加如下内容:


    sentry.service.security.mode
    kerberos


   sentry.service.server.principal
    sentry/[email protected]


    sentry.service.server.keytab
    /etc/sentry/conf/sentry.keytab

1.2. 准备测试数据

参考 Securing Impala for analysts,准备测试数据:

$ cat /tmp/events.csv
10.1.2.3,US,android,createNote
10.200.88.99,FR,windows,updateNote
10.1.2.3,US,android,updateNote
10.200.88.77,FR,ios,createNote
10.1.4.5,US,windows,updateTag

$ hive -S
hive> create database sensitive;
hive> create table sensitive.events (
    ip STRING, country STRING, client STRING, action STRING
  ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

hive> load data local inpath '/tmp/events.csv' overwrite into table sensitive.events;
hive> create database filtered;
hive> create view filtered.events as select country, client, action from sensitive.events;
hive> create view filtered.events_usonly as
  select * from filtered.events where country = 'US';

1.3 Hive-server2 集成 sentry

要求

在使用 Sentry 时,有如下要求:

1、需要修改 /user/hive/warehouse 权限:

hdfs dfs -chmod -R 770 /user/hive/warehouse
hdfs dfs -chown -R hive:hive /user/hive/warehouse

2、修改 hive-site.xml 文件,关掉 HiveServer2 impersonation

3、taskcontroller.cfg 文件中确保 min.user.id=0

修改配置文件

修改 hive-site.xml,添加如下:


    hive.security.authorization.task.factory
        org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl


    hive.server2.session.hook
    org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook


    hive.sentry.conf.url
    file:///etc/hive/conf/sentry-site.xml

在 /etc/hive/conf/ 目录创建 sentry-site.xml:




   sentry.service.client.server.rpc-port
   8038


   sentry.service.client.server.rpc-address
   cdh1


   sentry.service.client.server.rpc-connection-timeout
   200000


    sentry.service.security.mode
    kerberos


   sentry.service.server.principal
    sentry/[email protected]


    sentry.service.server.keytab
    /etc/sentry/conf/sentry.keytab


    sentry.hive.provider
    org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider
  

    sentry.hive.provider.backend
    org.apache.sentry.provider.db.SimpleDBProviderBackend
  

    sentry.hive.server
    server1
  
  
    sentry.metastore.service.users
    hive
  

    sentry.hive.testing.mode
    false
  

sentry-store 中创建角色和组

在 beeline 中通过 hive(注意,在 sentry 中 hive 为管理员用户)的 ticket 连接 hive-server2,创建 role、group 等等,执行下面语句:

create role admin_role;
GRANT ALL ON SERVER server1 TO ROLE admin_role;
GRANT ROLE admin_role TO GROUP admin;
GRANT ROLE admin_role TO GROUP hive;

create role test_role;
GRANT ALL ON DATABASE filtered TO ROLE test_role;
GRANT ALL ON DATABASE sensitive TO ROLE test_role;
GRANT ROLE test_role TO GROUP test;

上面创建了两个角色,一个是 admin_role,具有管理员权限,可以读写所有数据库,并授权给 admin 和 hive 组(对应操作系统上的组);一个是 test_role,只能读写 filtered 和 sensitive 数据库,并授权给 test 组

在 ldap 创建测试用户

在 ldap 服务器上创建系统用户 yy_test,并使用 migrationtools 工具将该用户导入 ldap,最后设置 ldap 中该用户密码。

# 创建 yy_test用户
useradd yy_test

grep -E "yy_test" /etc/passwd  >/opt/passwd.txt
/usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif
ldapadd -x -D "uid=ldapadmin,ou=people,dc=lashou,dc=com" -w secret -f /opt/passwd.ldif

#使用下面语句修改密码,填入上面生成的密码,输入两次:

ldappasswd -x -D 'uid=ldapadmin,ou=people,dc=lashou,dc=com' -w secret "uid=yy_test,ou=people,dc=lashou,dc=com" -S

在每台 datanode 机器上创建 test 分组,并将 yy_test 用户加入到 test 分组:

groupadd test ; useradd yy_test; usermod -G test,yy_test yy_test

测试

通过 beeline 连接 hive-server2,进行测试:

# 切换到 test 用户进行测试
$ su test

$ kinit -k -t test.keytab test/[email protected]

$ beeline -u "jdbc:hive2://cdh1:10000/default;principal=test/[email protected]"

1.4 Impala 集成 Sentry

修改配置

修改 /etc/default/impala 文件中的 IMPALA_SERVER_ARGS 参数,添加:

-server_name=server1
-sentry_config=/etc/impala/conf/sentry-site.xml

在 IMPALA_CATALOG_ARGS 中添加:

-sentry_config=/etc/impala/conf/sentry-site.xml

注意:server1 必须和 sentry-provider.ini 文件中的保持一致。

IMPALA_SERVER_ARGS 参数最后如下:

hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`

IMPALA_SERVER_ARGS=" \
    -log_dir=${IMPALA_LOG_DIR} \
    -catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
    -state_store_port=${IMPALA_STATE_STORE_PORT} \
    -use_statestore \
    -state_store_host=${IMPALA_STATE_STORE_HOST} \
    -kerberos_reinit_interval=60 \
    -principal=impala/${hostname}@JAVACHEN.COM \
    -keytab_file=/etc/impala/conf/impala.keytab \
    -enable_ldap_auth=true -ldap_uri=ldaps://cdh1 -ldap_baseDN=ou=people,dc=javachen,dc=com \
    -server_name=server1 \
    -sentry_config=/etc/impala/conf/sentry-site.xml \
    -be_port=${IMPALA_BACKEND_PORT} -default_pool_max_requests=-1 -mem_limit=60%"

创建 /etc/impala/conf/sentry-site.xml 内容如下:




   sentry.service.client.server.rpc-port
   8038


   sentry.service.client.server.rpc-address
   cdh1


   sentry.service.client.server.rpc-connection-timeout
   200000


    sentry.service.security.mode
    kerberos


   sentry.service.server.principal
    sentry/[email protected]


    sentry.service.server.keytab
    /etc/sentry/conf/sentry.keytab


测试

请参考下午基于文件存储方式中 impala 的测试。

2. 基于文件存储方式

2.1 hive 集成 sentry

修改配置文件

在 hive 的 /etc/hive/conf 目录下创建 sentry-site.xml 文件,内容如下:



  
    hive.sentry.server
    server1
  
  
    sentry.hive.provider.backend
    org.apache.sentry.provider.file.SimpleFileProviderBackend
  
  
    hive.sentry.provider
    org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider
  
  
    hive.sentry.provider.resource
    /user/hive/sentry/sentry-provider.ini
  

创建 sentry-provider.ini 文件并将其上传到 hdfs 的 /user/hive/sentry/ 目录:

$ cat /tmp/sentry-provider.ini
[databases]
# Defines the location of the per DB policy file for the customers DB/schema
#db1 = hdfs://cdh1:8020/user/hive/sentry/db1.ini

[groups]
admin = any_operation
hive = any_operation
test = select_filtered

[roles]
any_operation = server=server1->db=*->table=*->action=*
select_filtered = server=server1->db=filtered->table=*->action=SELECT
select_us = server=server1->db=filtered->table=events_usonly->action=SELECT

[users]
test = test
hive= hive

$ hdfs dfs -rm -r /user/hive/sentry/sentry-provider.ini
$ hdfs dfs -put /tmp/sentry-provider.ini /user/hive/sentry/
$ hdfs dfs -chown hive:hive /user/hive/sentry/sentry-provider.ini
$ hdfs dfs -chmod 640 /user/hive/sentry/sentry-provider.ini

关于 sentry-provider.ini 文件的语法说明,请参考官方文档。这里我指定了 Hive 组有全部权限,并指定 Hive 用户属于 Hive 分组,而其他两个分组只有部分权限。

然后在 hive-site.xml 中添加如下配置:


    hive.security.authorization.task.factory
        org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl


    hive.server2.session.hook
    org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook


    hive.sentry.conf.url
    file:///etc/hive/conf/sentry-site.xml

将配置文件同步到其他节点,并重启 hive-server2 服务。

测试

这里,我集群中 hive-server2 开启了 kerberos 认证,故通过 hive 用户来连接 hive-server2。

$ kinit -k -t /etc/hive/conf/hive.keytab hive/[email protected]

$ beeline -u "jdbc:hive2://cdh1:10000/default;principal=hive/[email protected]"
    scan complete in 10ms
    Connecting to jdbc:hive2://cdh1:10000/default;principal=hive/[email protected]
    Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
    Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    Beeline version 0.13.1-cdh5.2.0 by Apache Hive
    5 rows selected (0.339 seconds)

    0: jdbc:hive2://cdh1:10000/default> show databases;
    +----------------+--+
    | database_name  |
    +----------------+--+
    | default        |
    | filtered       |
    | sensitive      |
    +----------------+--+
    10 rows selected (0.145 seconds)

    0: jdbc:hive2://cdh1:10000/default> use filtered
    No rows affected (0.132 seconds)

    0: jdbc:hive2://cdh1:10000/default> show tables;
    +----------------+--+
    |    tab_name    |
    +----------------+--+
    | events         |
    | events_usonly  |
    +----------------+--+
    2 rows selected (0.158 seconds)
    0: jdbc:hive2://cdh1:10000/default> use sensitive;
    No rows affected (0.115 seconds)

    0: jdbc:hive2://cdh1:10000/default> show tables;
    +-----------+--+
    | tab_name  |
    +-----------+--+
    | events    |
    +-----------+--+
    1 row selected (0.148 seconds)

2.3 impala 集成 sentry

修改配置文件

修改 /etc/default/impala 文件中的 IMPALA_SERVER_ARGS 参数,添加:

-server_name=server1
-authorization_policy_file=/user/hive/sentry/sentry-provider.ini
-authorization_policy_provider_class=org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider

注意:server1 必须和 sentry-provider.ini 文件中的保持一致。

IMPALA_SERVER_ARGS 参数最后如下:

hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`

IMPALA_SERVER_ARGS=" \
    -log_dir=${IMPALA_LOG_DIR} \
    -catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
    -state_store_port=${IMPALA_STATE_STORE_PORT} \
    -use_statestore \
    -state_store_host=${IMPALA_STATE_STORE_HOST} \
    -be_port=${IMPALA_BACKEND_PORT} \
    -server_name=server1 \
    -authorization_policy_file=/user/hive/sentry/sentry-provider.ini \
    -authorization_policy_provider_class=org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider \
    -enable_ldap_auth=true -ldap_uri=ldaps://cdh1 -ldap_baseDN=ou=people,dc=javachen,dc=com \
    -kerberos_reinit_interval=60 \
    -principal=impala/${hostname}@JAVACHEN.COM \
    -keytab_file=/etc/impala/conf/impala.keytab \
"

测试

重启 impala-server 服务,然后进行测试。因为我这里 impala-server 集成了 kerberos 和 ldap,现在通过 ldap 来进行测试。

先通过 ldap 的 test 用户来测试:

impala-shell -l -u test
    Starting Impala Shell using LDAP-based authentication
    LDAP password for test:
    Connected to cdh1:21000
    Server version: impalad version 2.0.0-cdh5 RELEASE (build ecf30af0b4d6e56ea80297df2189367ada6b7da7)
    Welcome to the Impala shell. Press TAB twice to see a list of available commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Shell build version: Impala Shell v2.0.0-cdh5 (ecf30af) built on Sat Oct 11 13:56:06 PDT 2014)

[cdh1:21000] > show databases;
    Query: show databases
    +---------+
    | name    |
    +---------+
    | default |
    +---------+
    Fetched 1 row(s) in 0.11s

[cdh1:21000] > show tables;

    Query: show tables
    ERROR: AuthorizationException: User 'test' does not have privileges to access: default.*

[cdh1:21000] >

可以看到 test 用户没有权限查看和数据库,这是因为 sentry-provider.ini 文件中并没有给 test 用户分配任何权限。

下面使用 hive 用户来测试。使用下面命令在 ldap 中创建 hive 用户和组并给 hive 用户设置密码。

$ grep hive /etc/passwd  >/opt/passwd.txt
$ /usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif

$ ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/passwd.ldif

$ grep hive /etc/group  >/opt/group.txt
$ /usr/share/migrationtools/migrate_group.pl /opt/group.txt /opt/group.ldif

$ ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/group.ldif

# 修改 ldap 中 hive 用户密码
$ ldappasswd -x -D 'uid=ldapadmin,ou=people,dc=javachen,dc=com' -w secret "uid=hive,ou=people,dc=javachen,dc=com" -S

然后,使用 hive 用户测试:

$ impala-shell -l -u hive
    Starting Impala Shell using LDAP-based authentication
    LDAP password for hive:
    Connected to cdh1:21000
    Server version: impalad version 2.0.0-cdh5 RELEASE (build ecf30af0b4d6e56ea80297df2189367ada6b7da7)
    Welcome to the Impala shell. Press TAB twice to see a list of available commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Shell build version: Impala Shell v2.0.0-cdh5 (ecf30af) built on Sat Oct 11 13:56:06 PDT 2014)

[cdh1:21000] > show databases;
    Query: show databases
    +------------------+
    | name             |
    +------------------+
    | _impala_builtins |
    | default          |
    | filtered         |
    | sensitive        |
    +------------------+
    Fetched 11 row(s) in 0.11s

[cdh1:21000] > use sensitive;
    Query: use sensitive

[cdh1:21000] > show tables;
    Query: show tables
    +--------+
    | name   |
    +--------+
    | events |
    +--------+
    Fetched 1 row(s) in 0.11s

[cdh1:21000] > select * from events;
    Query: select * from events
    +--------------+---------+---------+------------+
    | ip           | country | client  | action     |
    +--------------+---------+---------+------------+
    | 10.1.2.3     | US      | android | createNote |
    | 10.200.88.99 | FR      | windows | updateNote |
    | 10.1.2.3     | US      | android | updateNote |
    | 10.200.88.77 | FR      | ios     | createNote |
    | 10.1.4.5     | US      | windows | updateTag  |
    +--------------+---------+---------+------------+
    Fetched 5 row(s) in 0.76s

同样,你还可以使用其他用户来测试。

也可以使用 beeline 来连接 impala-server 来进行测试:

$ beeline -u "jdbc:hive2://cdh1:21050/default;" -n test -p test
  scan complete in 2ms
  Connecting to jdbc:hive2://cdh1:21050/default;
  Connected to: Impala (version 2.0.0-cdh5)
  Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
  Transaction isolation: TRANSACTION_REPEATABLE_READ
  Beeline version 0.13.1-cdh5.2.0 by Apache Hive
  0: jdbc:hive2://cdh1:21050/default>

3. 参考文章

  • Securing Impala for analysts
  • Setting Up Hive Authorization with Sentry
  • Sentry源码中的配置例子

你可能感兴趣的:(Hadoop,CDH,Sentry)