1.在本地安装kerberos client 组件
yum install krb5-user libpam-krb5 libpam-ccreds auth-client-config
yum install krb5-workstation
2.拷贝kerberos 集群的 /etc/krb5.conf 到本地 的 /etc/ (覆盖组件初始化的krb5.conf)
3.拷贝kerberos 集群的 某个keytab(见下文,你要使用的principal 进行访问远程Kerberos集群的 keytab) 到本地,在程序中指定该目录,见5.
4.输入密码,初始化要用的principal,本地生成用于访问远程的TGT:
如:使用的principal是 dp/admin ,realm:GAI.COM
kinit dp/[email protected]
5.创建 项目:
package Test
import java.net.{URI, URLClassLoader}
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.security.UserGroupInformation
import org.apache.spark.sql.SparkSession
object TestKerberos {
def main(args: Array[String]): Unit = {
// val classLoader = Thread.currentThread.getContextClassLoader
//
// val urlclassLoader = classLoader.asInstanceOf[URLClassLoader]
// val urls = urlclassLoader.getURLs
// for (url <- urls) {
// println("classpath "+url)
// }
val spark = SparkSession.builder()
.config("spark.executor.memory", "4g")
.config("spark.yarn.keytab","/home/jerry/keytab/dp-admin.keytab")
.config("spark.yarn.principal","dp/[email protected]")
.config("spark.security.credentials.hive.enabled","false")
.config("spark.security.credentials.hbase.enabled","false")
.config("spark.driver.memory", "3g")
.config("spark.default.parallelism", "8")
.master("local").getOrCreate()
//kerberos集群配置文件配置
System.setProperty("java.security.krb5.conf", "/etc/krb5.conf")
spark.sparkContext.hadoopConfiguration.set("hadoop.security.authentication","kerberos")
spark.sparkContext.hadoopConfiguration.set("dfs.namenode.kerberos.principal.pattern", "*/*@GAI.COM")
spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/data/core-site.xml")
spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/data/hdfs-site.xml")
// spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/core-site.xml")
// spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/hdfs-site.xml")
// spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/hadoop-policy.xml")
// spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/kms-acls.xml")
// spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/mapred-site.xml")
// spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/yarn-site.xml")
// spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/ssl-client.xml")
// spark.sparkContext.hadoopConfiguration.addResource("modules/LogProcess/src/res/ssl-server.xml")
// spark.sparkContext.hadoopConfiguration.set("fs.defaultFS","hdfs://10.111.32.184:8020")
//用户登录
UserGroupInformation.setConfiguration(spark.sparkContext.hadoopConfiguration)
UserGroupInformation.loginUserFromKeytab("dp/admin", "/home/jerry/keytab/dp-admin.keytab")
//
val rdd = spark.sparkContext.textFile("hdfs://10.111.32.184:8020/user/dp/file/data/liyiwen/17/97/6ad813e9-415d-4bdb-a4e4-6a84196c36f9/add_feature_column")
rdd.foreach(x=>println(x))
rdd.saveAsTextFile("hdfs://10.111.32.184:8020/user/dp/file/data/jerry2")
// val df = spark.read.option("header","true").option("inferSchema","true").csv("hdfs://10.111.32.184:8020/user/dp/file/data/liyiwen/17/97/6ad813e9-415d-4bdb-a4e4-6a84196c36f9/add_feature_column")
// df.show()
// df.write.option("header","true").mode("overwrite").csv("hdfs://10.111.32.184:8020/user/dp/demo")
6.预先进行build 项目,使项目产生classpath
7.使用如下项目,查看项目的classpath,把要访问的远程集群的hdfs文件(/hadoop/etc/hadoop所有的*.xml配置文件)复制到当前项目合适的classpath(代码编译后产生的目录)下面
a.要复制文件:
capacity-scheduler.xml
core-site.xml
hadoop-policy.xml
hdfs-site.xml
kms-acls.xml
kms-site.xml
mapred-site.xml
ssl-client.xml
ssl-server.xml
yarn-site.xml
b.测试classpath 代码:
val classLoader = Thread.currentThread.getContextClassLoader
val urlclassLoader = classLoader.asInstanceOf[URLClassLoader]
val urls = urlclassLoader.getURLs
for (url <- urls) {
println("classpath "+url)
}
寻找合适的classpath,将要访问的文件发送 当前项目合适的classpath(避免跟jdk,gradle/maven 仓库依赖放一块),重新build后,要再次放入,否则会报错:Exception in thread "main" java.io.IOException: Can't get Master Kerberos principal for use as renewer
7.启动 5中测试