基于 ambari 安装 Kerberos 问题总结

更多问题解决来源

https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html (包含大部分Kerberos问题,票据,server)

https://www.bbsmax.com/A/rV574Yk9dP/

1.问题:
2019-03-22 20:53:01 WARN  ScriptBasedMapping:254 - Exception running /etc/hadoop/conf/topology_script.py 10.111.32.186 
19/03/22 20:53:01 INFO LineBufferedStream: stdout: java.io.IOException: Cannot run program "/etc/hadoop/conf/topology_script.py" (in directory "/home/jerry"): error=2, 没有那个文件或目录
19/03/22 20:53:01 INFO LineBufferedStream: stdout: Caused by: java.io.IOException: error=2, 没有那个文件或目录


原因:
(1)在拷贝集群的core-site.xml ,*.xml 资源文件文件时,在IDE(IDEA,pycharm)运行时,未把下面的注释(core-site.xml中):


(2) 将程序打包后,以yarn-client集群运行时,如果能读到本地有集群的资源文件时,仍然会有读取上述配置,执行文件,报上述错误。
我的就是使用livy 以 yarn-client模式运行提交作业,在远程集群执行jar包时,本地报上述错误(未影响集群运行),解决是讲运行模式
改为yarn-cluster,使作业运行在远端集群中,不依赖本地的资源


3.创建 本地访问Kerberos集群的代理用户,秘钥 (当前的主机名为 / 后面的东西)


addprinc  HTTP/[email protected]
modprinc -maxrenewlife "1 week" +allow_renewable HTTP/[email protected]
xst -k http.livy.test.com.keytab   HTTP/[email protected]

addprinc    dp/[email protected]
modprinc -maxrenewlife "1 week" +allow_renewable dp/[email protected]
xst -k dp.livy.test.com.keytab   dp/[email protected]

hdfs 所在机器节点进行初始化Ticket cache:
kinit -kt   dp/livy.test.com.keytab  dp/livy.test.com 

测试hdfs,发现不再抛出TGT异常:

hdfs  dfs -ls /

 

 


4.复制 $HADOOP_HOME share/lib 下的jersey-*-.jar到classpath (否则会报yarn的classNotFound)
5.在kdc server 创建对应的 本地(提交livy作业的host)主机的 principal HTTP/_HOSTNAME

5.添加本地访问(10.111.23.70)集群的代理设置:
hadoop.proxyuser.dp.groups=*
hadoop.proxyuser.dp.hosts=10.111.23.70


019-03-21 12:18:20 ERROR ApplicationMaster:91 - Uncaught exception: 
org.apache.spark.SparkException: Failed to connect to driver!
    at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:657)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:517)
    at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:347)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:800)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:799)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:824)
    at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:854)
    at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
2019-03-21 12:18:20 INFO  ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
2019-03-21 12:18:20 INFO  ApplicationMaster:54 - Deleting staging directory hdfs://namenode.gai.test.com:8020/user/dp/.sparkStaging/application_1553139449000_0004
2019-03-21 12:18:20 INFO  ShutdownHookManager:54 - Shutdown hook called


6.livy ERROR RSCClient: Failed to connect to context.


在 livy keberos ,hdadoop集群都配置所有的ip-hostname


7.ambari 经过配置后 start service 时 ,某台节点报错,principal没有对应的 keytab
resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/kinit -c /var/lib/ambari-agent/tmp/kerberos_service_check_cc_fbb115245f1af7a48db9ac7847a98bb9036a0cfceeae4045ac38351c -kt /etc/security/keytabs/smokeuser.headless.keytab [email protected]' returned 1. kinit: Password incorrect while getting initial credentials

解决思路参考:https://www.cnblogs.com/zhzhang/p/5692249.html

https://blog.csdn.net/tianbaochao/article/details/78592989 (该文章指出加密方式影响 秘钥加密,解密,提供思路)
原因:Kerberos kinit命令执行时无对应的keytab 或者keytab过期,密码不对, principal和keytab不对应
解决:删除Kerberos database种的 principal,重新创建principal ,以及对应的 keytab

delprinc  [email protected]
addprinc  [email protected]

指定加密,解密方式:
xst -e  aes128-cts-hmac-sha1-96,arcfour-hmac,des-cbc-md5,des3-cbc-sha1,aes256-cts-hmac-sha1-96 -k smokeuser.headless.keytab -q [email protected]

8.我们信任您已经从系统管理员那里了解了日常注意事项。
总结起来无外乎这三点:

    #1) 尊重别人的隐私。
    #2) 输入前要先考虑(后果和风险)。
    #3) 权力越大,责任越大。

sudo: 没有终端存在,且未指定 askpass 程序

原因:ambari组用户无法在安装Kerberos时 免密执行脚本执行 脚本

解决:使用visudo ,在/etc/sudoers修改 ambari用户及组用户可以免密执行脚本
visudo


%ambari  ALL=(ALL)       NOPASSWD: ALL    # admin 组免密

9.通过livy+kerberos 访问远程hadoop集群时报错,无法找到合适的加密解密 类型的keytab

GSS-API Exception - Cannot find key of appropriate type to decrypt AP REP - AES128

原因<1>: keytab多次导出,重复创建,造成keytab的加密解密类型种类,数量与其他正常keytab不一致

解决:删除该principal及其keytab,

指定该keytab的加密解密方式(通过查看 其他keytab的加密解密方式指定 klist -ke 其他.keytab),指定加密解密方式后创建,重启kdc-server ,重启ambari集群(hadoop相关集群)

klist -ke 其他.keytab

kadmin.local:

delprinc  principal

addprinc principal

 

#下面根据自己的principal,realms名称适当修改(来自https://blog.csdn.net/tianbaochao/article/details/78592989 启发)

 xst -e  aes128-cts-hmac-sha1-96,arcfour-hmac,des-cbc-md5,des3-cbc-sha1,aes256-cts-hmac-sha1-96 -k principal.keytab -q principal/EXAMPLE.COM

原因<2>: 本地和livy 所在机器,远程机器未全部安装 JCE 扩展加密组件,无法进行加密解密

解决:

下载JCE,解压到$JAVA_HOME/jre/lib/security下面

从 Oracle JCE官网 (点击左边)下载


unzip -o -j -q jce_policy-8.zip -d   $JAVA_HOME/jre/lib/security  (避免软链接)
 

 

10.本地Kerberos  API 访问 远程 Kerberos集群出现 无法解析principal 问题

19/03/15 16:43:40 INFO SparkContext: Created broadcast 0 from textFile at TestKerberos.scala:42

Exception in thread "main" java.io.IOException: Can't get Master Kerberos principal for use as renewer

at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)

at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)

at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)

at scala.Option.getOrElse(Option.scala:121)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)

at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)

at scala.Option.getOrElse(Option.scala:121)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)

at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:918)

at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:916)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)wenti

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)

at org.apache.spark.rdd.RDD.foreach(RDD.scala:916)

at Test.TestKerberos$.main(TestKerberos.scala:44)

at Test.TestKerberos.main(TestKerberos.scala)

19/03/15 16:43:41 INFO SparkContext: Invoking stop() from shutdown hook

19/03/15 16:43:41 INFO SparkUI: Stopped Spark web UI at http://10.111.23.70:4040

19/03/15 16:43:41 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

19/03/15 16:43:41 INFO MemoryStore: MemoryStore cleared

 

原因: 没有把要访问的远程集群的hdfs文件(/hadoop/etc/hadoop所有的*.xml配置文件)复制到当前项目合适的classpath(代码编译后产生的目录)下面

使用如下代码查找合适的classpath:

 

解决:

解决思路参考: https://www.bbsmax.com/A/rV574Yk9dP/

复制要访问的远程集群的hdfs文件(/hadoop/etc/hadoop所有的*.xml配置文件)复制到当前项目的classpath

使用如下代码寻找当前项目的classpath,选择合适的classpath目录:

val classLoader = Thread.currentThread.getContextClassLoader

 

val urlclassLoader = classLoader.asInstanceOf[URLClassLoader]

val urls = urlclassLoader.getURLs

for (url <- urls) {

println("classpath "+url)

}

 

 

11.创建时加密方式,种类而在使用生成的keytab(秘钥)发生报错

 

原因:手动生成的加密方式、数量与ambari安装时自动的生成的keytab不一致,需要保持一直

 

xst -e  aes128-cts-hmac-sha1-96,arcfour-hmac,des-cbc-md5,des3-cbc-sha1,aes256-cts-hmac-sha1-96 -k smokeuser.headless.keytab -q [email protected]


 

你可能感兴趣的:(spark)