spark + hadoop 访问 基于Kerberos 安全认证、授权的hdfs集群

1.在本地安装kerberos client 组件 

yum install krb5-user libpam-krb5 libpam-ccreds auth-client-config
yum  install  krb5-workstation

 

2.拷贝kerberos 集群的 /etc/krb5.conf 到本地 的 /etc/ (覆盖组件初始化的krb5.conf)


3.拷贝kerberos 集群的 某个keytab(见下文,你要使用的principal 进行访问远程Kerberos集群的 keytab)  到本地,在程序中指定该目录,见5.


4.输入密码,初始化要用的principal,本地生成用于访问远程的TGT:

如:使用的principal是 dp/admin   ,realm:GAI.COM
kinit dp/[email protected]

 

 

5.创建 项目:

package Test

import java.net.{URI, URLClassLoader}

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.security.UserGroupInformation
import org.apache.spark.sql.SparkSession

object TestKerberos {


  def main(args: Array[String]): Unit = {



//        val classLoader = Thread.currentThread.getContextClassLoader
//
//        val urlclassLoader = classLoader.asInstanceOf[URLClassLoader]
//        val urls = urlclassLoader.getURLs
//        for (url <- urls) {
//          println("classpath    "+url)
//        }

    val spark = SparkSession.builder()
      .config("spark.executor.memory", "4g")
      .config("spark.yarn.keytab","/home/jerry/keytab/dp-admin.keytab")
      .config("spark.yarn.principal","dp/[email protected]")
      .config("spark.security.credentials.hive.enabled","false")
      .config("spark.security.credentials.hbase.enabled","false")
      .config("spark.driver.memory", "3g")
      .config("spark.default.parallelism", "8")
      .master("local").getOrCreate()

    //kerberos集群配置文件配置
    System.setProperty("java.security.krb5.conf", "/etc/krb5.conf")
    spark.sparkContext.hadoopConfiguration.set("hadoop.security.authentication","kerberos")
    spark.sparkContext.hadoopConfiguration.set("dfs.namenode.kerberos.principal.pattern", "*/*@GAI.COM")
    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/data/core-site.xml")
    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/data/hdfs-site.xml")




//    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/core-site.xml")
//    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/hdfs-site.xml")
//    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/hadoop-policy.xml")
//    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/kms-acls.xml")
//    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/mapred-site.xml")
//    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/yarn-site.xml")
//    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/ssl-client.xml")
//    spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/ssl-server.xml")
//    spark.sparkContext.hadoopConfiguration.set("fs.defaultFS","hdfs://10.111.32.184:8020")




    //用户登录
    UserGroupInformation.setConfiguration(spark.sparkContext.hadoopConfiguration)


    UserGroupInformation.loginUserFromKeytab("dp/admin", "/home/jerry/keytab/dp-admin.keytab")





//
    val rdd = spark.sparkContext.textFile("hdfs://10.111.32.184:8020/user/dp/file/data/liyiwen/17/97/6ad813e9-415d-4bdb-a4e4-6a84196c36f9/add_feature_column")
    rdd.foreach(x=>println(x))


    rdd.saveAsTextFile("hdfs://10.111.32.184:8020/user/dp/file/data/jerry2")


//      val df = spark.read.option("header","true").option("inferSchema","true").csv("hdfs://10.111.32.184:8020/user/dp/file/data/liyiwen/17/97/6ad813e9-415d-4bdb-a4e4-6a84196c36f9/add_feature_column")
//      df.show()
//      df.write.option("header","true").mode("overwrite").csv("hdfs://10.111.32.184:8020/user/dp/demo")

 

 

6.预先进行build 项目,使项目产生classpath

spark + hadoop 访问 基于Kerberos 安全认证、授权的hdfs集群_第1张图片

 

7.使用如下项目,查看项目的classpath,把要访问的远程集群的hdfs文件(/hadoop/etc/hadoop所有的*.xml配置文件)复制到当前项目合适的classpath(代码编译后产生的目录)下面

a.要复制文件:

capacity-scheduler.xml
 core-site.xml
 hadoop-policy.xml
 hdfs-site.xml
 kms-acls.xml
 kms-site.xml
 mapred-site.xml
 ssl-client.xml
 ssl-server.xml
 yarn-site.xml

 

b.测试classpath 代码:

 

val classLoader = Thread.currentThread.getContextClassLoader

        val urlclassLoader = classLoader.asInstanceOf[URLClassLoader]
        val urls = urlclassLoader.getURLs
        for (url <- urls) {
          println("classpath    "+url)
        }
寻找合适的classpath,将要访问的文件发送 当前项目合适的classpath(避免跟jdk,gradle/maven 仓库依赖放一块),重新build后,要再次放入,否则会报错:Exception in thread "main" java.io.IOException: Can't get Master Kerberos principal for use as renewer

7.启动 5中测试

 

你可能感兴趣的:(hadoop,spark)