sparksql集成sentry遇到的问题

       sparksql本身并不提供安全认证机制,当前集群的安全认证主要包括sentry和ranger两大块,在通过sparksql执行建表时,sentry的权限报错'org.apache.hadoop.hive.metastore.api.MetaException: User xxx does not have privileges for CREATETABLE',然而通过hive的beeline执行就可以。

        通过对sentry日志的分析,定位到sentry校验权限的代码:

public void authorize(HiveOperation hiveOp, HiveAuthzPrivileges stmtAuthPrivileges,
      Subject subject, List> inputHierarchyList,
      List> outputHierarchyList)
          throws AuthorizationException {
    if (!open) {
      throw new IllegalStateException("Binding has been closed");
    }
    boolean isDebug = LOG.isDebugEnabled();
    if(isDebug) {
      LOG.debug("Going to authorize statement " + hiveOp.name() +
          " for subject " + subject.getName());
    }

    // Check read entities
    Map> requiredInputPrivileges =
        stmtAuthPrivileges.getInputPrivileges();
    if(isDebug) {
      LOG.debug("requiredInputPrivileges = " + requiredInputPrivileges);
      LOG.debug("inputHierarchyList = " + inputHierarchyList);
    }
    Map> requiredOutputPrivileges =
        stmtAuthPrivileges.getOutputPrivileges();
    if(isDebug) {
      LOG.debug("requiredOuputPrivileges = " + requiredOutputPrivileges);
      LOG.debug("outputHierarchyList = " + outputHierarchyList);
    }
    LOG.info("user: {}, required input hierarchy: {}, required output hierarchy: {}",
            new Object[]{subject.getName(), inputHierarchyList, outputHierarchyList});

    boolean found = false;
    for (Map.Entry> entry : requiredInputPrivileges.entrySet()) {
      AuthorizableType key = entry.getKey();
      for (List inputHierarchy : inputHierarchyList) {
        if (getAuthzType(inputHierarchy).equals(key)) {
          found = true;
          if (!authProvider.hasAccess(subject, inputHierarchy, entry.getValue(), activeRoleSet)) {
            throw new AuthorizationException("User " + subject.getName() +
                " does not have privileges for " + hiveOp.name());
          }
        }
      }
      if (!found && !key.equals(AuthorizableType.URI) && !(hiveOp.equals(HiveOperation.QUERY))
          && !(hiveOp.equals(HiveOperation.CREATETABLE_AS_SELECT))) {
        throw new AuthorizationException("Required privilege( " + key.name() + ") not available in input privileges");
      }
      found = false;
    }

    for (Map.Entry> entry : requiredOutputPrivileges.entrySet()) {
      AuthorizableType key = entry.getKey();
      for (List outputHierarchy : outputHierarchyList) {
        if (getAuthzType(outputHierarchy).equals(key)) {
          found = true;
          if (!authProvider.hasAccess(subject, outputHierarchy, entry.getValue(), activeRoleSet)) {
            throw new AuthorizationException("User " + subject.getName() +
                " does not have privileges for " + hiveOp.name());
          }
        }
      }
      if(!found && !(key.equals(AuthorizableType.URI)) &&  !(hiveOp.equals(HiveOperation.QUERY))) {
        throw new AuthorizationException("Required privilege( " + key.name() + ") not available in output privileges");
      }
      found = false;
    }

  }

    该方法的传入参数包括:

    hiveOp: 当前sql的操作类型

    stmtAuthPrivileges: 本次操作所需的权限集合

    subject: 表示当前用户

    inputHierarchyList和outputHierarchyList分别表示输入对象和输出对象,即本次sql需要访问的输入输出资源

    用户的鉴权分为两步:

    1. 用户是否拥有对输入对象列表的该operation对应的访问权限

    2. 用户是否拥有对输出对象列表的该operation对应的访问权限

    stmtAuthPrivileges包含了输入对象权限map和输出对象权限map,map的key值为一个AuthorizableType枚举对象,取值为Server,Db,Table,Column,View,URI中的一种,对于每一个AuthorizableType,至少有一个inputList或outputList与其authzType相同,此时通过Provider的hasAccess方法判断该用户是否对该对象列表拥有相应的权限。

      真正校验权限的逻辑在ResourceAuthorizationProvider的doHasAccess方法中:

private boolean doHasAccess(Subject subject,
      List authorizables, Set actions,
      ActiveRoleSet roleSet) {
    List requestPrivileges = buildPermissions(authorizables, actions);
    try {
      Set groups = getGroups(subject);
      Set users = Sets.newHashSet(subject.getName());
      Set hierarchy = new HashSet();
      for (Authorizable authorizable : authorizables) {
        hierarchy.add(KV_JOINER.join(authorizable.getTypeName(), authorizable.getName()));
      }
      LOGGER.info("get privileges args, groups: {}, users: {}, role set: {}, authorizables: {}",
              new Object[]{groups, users, roleSet, authorizables.toArray(new Authorizable[0])});
      Iterable privileges = getPrivileges(groups, users, roleSet,
              authorizables.toArray(new Authorizable[0]));

      lastFailedPrivileges.get().clear();

      for (String requestPrivilege : requestPrivileges) {
        Privilege priv = privilegeFactory.createPrivilege(requestPrivilege);
        for (Privilege permission : privileges) {
          boolean result = permission.implies(priv, model);
            LOGGER.info("user: {}, group: {}, ProviderPrivilege {}, RequestPrivilege {}, RoleSet, {}, Result {}",
                    new Object[]{users, groups, permission, requestPrivilege, roleSet, result});
          if (result) {
            return true;
          }
        }
      }
    } catch (Exception ex) {
      LOGGER.error("sentry auth privilege error: {}", Throwables.getStackTraceAsString(ex));
    }
    lastFailedPrivileges.get().addAll(requestPrivileges);
    return false;
  }

      sentry根据用户的组、角色从数据库中读取其拥有的权限,并与需要的权限进行比对,只有当inputHierarchyList中的所需权限都符合时,才能通过认证。

      通过对sparksql执行时的sentry日志和beeline执行时的sentry日志对比发现,sparksql执行传入的inputHierarchyList中包含了欲创建表的location,此时表尚未创建,用户只有对数据库的权限,因此对该表的权限认证不能通过。

 

      通过分析sparksql的源码,spark创建表的逻辑存在于HiveClientImpl的createTable方法中:

override def createTable(table: CatalogTable, ignoreIfExists: Boolean): Unit = withHiveState {
    verifyColumnDataType(table.dataSchema)
    client.createTable(toHiveTable(table, Some(userName)), ignoreIfExists)
  }

     这里进行了转换,将CatalogTable转换成HiveTable,CatalogTable中包含了location信息。

     对这块逻辑做修改,在执行createTable时,去掉table中的location信息。

override def createTable(table: CatalogTable, ignoreIfExists: Boolean): Unit = withHiveState {
    verifyColumnDataType(table.dataSchema)
    val hiveTable = toHiveTable(table, Some(userName))
	if (sparkConf.getBoolean("spark.sql.enable.sentry", defaultValue = false)) {
		hiveTable.getTTable.getSd.setLocation(null)
	}
    client.createTable(hiveTable, ignoreIfExists)
  }

      执行sparksql时,设置spark.sql.enable.sentry为true,则sentry权限校验通过。

你可能感兴趣的:(sparksql集成sentry遇到的问题)