hdfsAPI访问Kerberos集群

隶属于文章系列：大数据安全实战 https://www.jianshu.com/p/76627fd8399c

import com.kingdee.bigdata.hina.conf.ConfigModelSynchro;
import com.kingdee.bigdata.hina.constant.Constants;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;
import java.io.IOException;
import java.net.URI;

public class HdfsUtil {
    protected static final Log log = LogFactory.getLog(HdfsUtil.class);
    FileSystem fs = null;
    public HdfsUtil(){}
    public HdfsUtil(String hdfspath) {
            String path = "hdfs://cluster2018:8020" + hdfspath;
            //log.debug("hdfs path:" + path);
            FileStatus fileStatus = null;
            //System.setProperty("hadoop.home.dir", model.getHadoopHome());
            Configuration conf = new Configuration();
            //conf.set("fs.defaultFS", "hdfs://10.247.24.53:8020");
            conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs " +
                    ".DistributedFileSystem");
            if (ConstantPool.KerberEnabled) {
                setKerberos(conf);
                try {
                    fs = FileSystem.get(URI.create(path), conf);
                } catch (Exception e) {
                    log.error("get the file failed", e);
                }

            } else {
                try {
                    fs = FileSystem.get(URI.create(path), conf);
                } catch (Exception e) {
                    log.error("get the file failed", e);
                }
            }
        }

    public FileSystem getFs() {
        return fs;
    }

    private void setKerberos(Configuration conf ) {
        conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem
                .class.getName());

        if (System.getProperty("os.name").toLowerCase().startsWith("win")) {
            System.setProperty("java.security.krb5.conf",
                    com.kingdee.bigdata.hina.constant.Constants.Krb5_Conf);
        } else {
             /*linux系统可不设，其会自动去寻找 /etc/krb5.conf*/
            System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
        }

        conf.set("hadoop.security.authentication", "kerberos");
        UserGroupInformation.setConfiguration(conf);
        try {
            UserGroupInformation.loginUserFromKeytab(Constants.HDFS_User, Constants.Hdfs_Key_Tab);
        } catch (Exception e) {
            log.error("身份认证异常： " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Proxy user的配置在参考文档中可以看到。当集群是安全模式是，超级用户要配置Kerberos凭证。
下面用hive用户的凭证来代理hello用户。

public static void main(String[] args) throws InterruptedException {

        //Create ugi for joe. The login user is 'super'.
        UserGroupInformation ugi =
                null;
        final HdfsUtil hu = new HdfsUtil("/tmp");
        try {
            ugi = UserGroupInformation.createProxyUser("hello",
                    UserGroupInformation.getLoginUser());
            ugi.doAs(new PrivilegedExceptionAction() {
                public String run() throws Exception {
                    //Submit a job
//                  JobClient jc = new JobClient(conf);
//                  jc.submitJob(conf);

                    //OR access hdfs
                    FileSystem fs = hu.getFs();
                    String someFilePath = "hdfs://cluster2018:8020/tmp/";
                    FileStatus[] listStatus = fs.listStatus(new Path
                            (someFilePath));
                    for (int i = 0; i < listStatus.length; i++) {
                        System.out.println(listStatus[i].toString());
                    }
                    fs.mkdirs(new Path("hdfs://cluster2018:8020/tmp/proxy123" +
                            ""));
                    fs.close();
                    return "";
                }
            });
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

然而创建完查看创建的文件：
drwxr-xr-x - hive supergroup 0 2018-04-23 19:35 /tmp/proxy123

当使用hive的凭证时，实际在用hive的权限在操作。

Delegation Tokens(代理token)
在分布式系统，如HDFS 或者MapReduce，会有很多客户端和服务器之间的交互，这些交互都必须要进行认证，例如：在一个HDFS读操作的过程中，需要多次调用namenode与datanode，
如果对每一次call，都使用三步式Kerberos认证，那么无疑会增加很高的负载，。Hadoop使用代理token，这种token在生成之后，就不会再和kerberos中心进行交互。Delegation Tokens
有Hadoop代表用户生成并使用，所以这里不要你进行签名认证。
一个Delegation Token，有NameNode产生，可以被认为是一个在client与服务器之间共享的secret，在client与server之间的第一次RPC调用的时候，并没有Delegation Token生成，所以它必须要经过kerberos进行认证，client会从namenode中获得一个Delegation Token。
如果client想进行HDFS blocks进行操作，client会使用一个特殊的Delegation Token，叫做block access token。这个token是namenode在client进行metadata请求的时候，作为给client的相应传递给client。client使用这个token来向datanode认证自己。这是可能是因为namenode会和datanode之间共享这个token，所以这是Block就只能被持有这个token的客户端进行访问了，要启用这token功能，需要设置dfs.block.access.token.enable =true。

参考：

如何解决Kerberos问题: "Server has invalid Kerberos principal: hdfs/host2@****.COM"
Apache Hadoop 2.7.1 – Proxy user - Superusers Acting On Behalf Of Other Users
Hadoop中的ProxyUser - OttoWu的个人页面

hdfsAPI访问Kerberos集群

你可能感兴趣的:(hdfsAPI访问Kerberos集群)