hive用户权限以及表权限实现思路

hive权限系统

hive本身提供的权限的系统是基于linux用户构建的,带来的问题就是,用户可以伪造账号访问数据,这样的话权限系统形同虚设;所以通常情况下,公司一般都会使用kerberos+sentry+Ldap这种架构构建数据仓库;这就需要数据团队有比较强的技术实力[kerberos这玩意玩起来挺费劲的],但是大多数公司可能用上了大数据,但技术储备不够完善;所以我在想如何在不适用这些插件,也能实现这些功能

用户访问控制

大多数情况下都使用hiveserver2这种方式访问hive,所以在这里我们尝试修改hiveserver2这个请求入口,控制用户的请求;我们事先约定用户访问hive都需要提供username和password,不提供username和password,我们认为是为非法用户,拒绝访问hive,为了实现这个功能,我们需要开启hiveserver2验证功能,修改hive-site.xml


  hive.server2.custom.authentication.class
  org.apache.hive.service.auth.CustomPasswdAuthenticator



    hive.server2.authentication
    CUSTOM

构建一个java maven项目,依赖如下

        
            org.apache.hive
            hive-exec
            3.1.0
        

用户登陆权限控制实现类org.apache.hive.service.auth.CustomPasswdAuthenticator

package org.apache.hive.service.auth;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.security.sasl.AuthenticationException;
import java.util.HashMap;
import java.util.Map;

public class CustomPasswdAuthenticator implements org.apache.hive.service.auth.PasswdAuthenticationProvider {

    private static final Logger LOG = LoggerFactory.getLogger(CustomPasswdAuthenticator.class);

    private static final Map users = new HashMap();
    static{
        users.put("xb","123456");
        users.put("hadoop","123456");
        users.put("hive","123456");
    }

    public void Authenticate(String userName, String passwd)
            throws AuthenticationException {
        if(userName==null||"".equals(userName)){
            throw new AuthenticationException("user can not be null");
        }else{
            if(users.get(userName)==null){
                throw new AuthenticationException("user "+userName +" is not exists");
            }
        }

        if(!passwd.equals(users.get(userName))){
            throw new AuthenticationException("user:"+userName +",passwd:"+passwd+". is error");
        }
        LOG.info("====================================user: "+userName+" try login. passwd is "+passwd);
    }
}

所有用户登陆都需要Authenticate这个方法,在这个方法里,我们可以对用户进行校验;这里为了方便,我把所有的用户存在了Map里,如果运用线上的话可以将此部分移到Mysql中去

编译打包,将jar放置到$HIVE_HOME/lib/下,重启hiveserver2即可

mvn clean package
hiveserver2 --hiveconf hive.root.logger=INFO,console

jdbc测试main方法

    public static void main(String[] args)throws Exception {

        Connection con = DriverManager.getConnection("jdbc:hive2://master:10000/default", 
            "hadoop", "123456");
        Statement stmt = con.createStatement();
        String sql = "select * from wh.test " ;
        ResultSet res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(String.valueOf(res.getString(1)) );
        }

    }

hiveserver日志输出如下

2018-09-22T04:03:14,815  INFO [HiveServer2-Handler-Pool: Thread-56] auth.CustomPasswdAuthenticator: ====================================user: hadoop try login. passwd is 123456

到这里hiveserver2就具备用户访问控制功能,接下来要做的就是用户分配····

表控制

上面这些只是做了用户级别的控制,正式环境中,这些是远远不够的;因为验证通过的用户拥有所有数据的访问权限,这明显是不合理的;大多情况下我们都需要做到表级别的权限控制;所以研究了下hive代码,追踪了下sql调用链,这个功能也是可以做到的,具体参照org.apache.hive.service.cli.session.HiveSessionImpl.java

 private OperationHandle executeStatementInternal(String statement,
      Map confOverlay, boolean runAsync, long queryTimeout) throws HiveSQLException {
    acquire(true, true);
//打印用户名,sql语句,以及用户密码
    LOG.info(username+"============================="+statement+"============================="+password);
    ExecuteStatementOperation operation = null;
    OperationHandle opHandle = null;
    try {
      operation = getOperationManager().newExecuteStatementOperation(getSession(), statement,
          confOverlay, runAsync, queryTimeout);
      opHandle = operation.getHandle();
      addOpHandle(opHandle);
      operation.run();
      return opHandle;
    } catch (HiveSQLException e) {
      // Refering to SQLOperation.java, there is no chance that a HiveSQLException throws and the
      // async background operation submits to thread pool successfully at the same time. So, Cleanup
      // opHandle directly when got HiveSQLException
      if (opHandle != null) {
        removeOpHandle(opHandle);
        getOperationManager().closeOperation(opHandle);
      }
      throw e;
    } finally {
      if (operation == null || operation.getBackgroundHandle() == null) {
        release(true, true); // Not async, or wasn't submitted for some reason (failure, etc.)
      } else {
        releaseBeforeOpLock(true); // Release, but keep the lock (if present).
      }
    }
  }

编译打包

mvn clean package -DskipTests -pl service

将hive-service-3.1.0.jar复制到$HIVE_HOME/lib下,重启hiveserver2,使用beeline方式访问hive

beeline -u 'jdbc:hive2://localhost:10000' -n xiaobin -p 123456
0: jdbc:hive2://localhost:10000> select * from wh.test;

+----------------------------------------------------+
|                     test.line                      |
+----------------------------------------------------+
| The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. |
|                                                    |
| The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. |
+----------------------------------------------------+
3 rows selected (3.952 seconds)

hiveserver日志输出如下

2018-09-22T02:28:11,681  INFO [4bdac79a-9f84-4975-bcde-f7a7d84d7586 HiveServer2-Handler-Pool: Thread-61] 
session.HiveSessionImpl: hadoop=============================select * from wh.test =============================null

我们可以看到用户名和sql已经打印出来了,但是有点奇怪的是passwd为null,追踪了下代码,没有找到是什么问题,但不影响我们对用户做权限验证,接下来我就说下思路,大家有需要的话自己去实现。

在方法executeStatementInternal中,首先需要解析sql语句,把用户提交的sql中所涉及的表都提取出来,sql解析提取表名的方案,太多了,大家网上一搜一大把[之前我用presto-parsse提取过,这个是presto中的一个包,单独拿出来也可以使用],接下来的就是和自己设计的表权限验证,这里也可以做库的权限,但事先需要约定用户提交的sql中表方式为db.tablename,就像之前的sql语句中的wh.test,再往下大家都懂了,有权限啥都不干,没权限就抛异常

你可能感兴趣的:(hive)