Ambari Issue Highlights

不定期更新

收录各种奇葩问题


ambari安装之后,启动hive MetaStore时报错

File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 293, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server ; /usr/hdp/current/hive-metastore/bin/schematool -initSchema -dbType mysql -userName hive -passWord [PROTECTED]' returned 1.
WARNING: Use "yarn jar" to launch YARN applications.
Metastore connection URL:     jdbc:mysql://c6405.ambari.apache.org/hive?createDatabaseIfNotExist=true
Metastore Connection Driver :     com.mysql.jdbc.Driver
Metastore connection User:     hive
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
*** schemaTool failed ***

Solution:
hive配置的mysql登陆密码,与mysql设置的hive用户连接密码不一致,修改mysql或hive配置的密码,保持一致即可。

spark2.0 on yarn

1.jerseyNoClassDefFoundError

bin/spark-sql -driver-memory 10g --verbose --master yarn --packages com.databricks:spark-csv_2.10:1.3.0 --executor-memory 4g --num-executors 20 --executor-cores 2
16/05/09 13:15:21 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/09 13:15:21 INFO server.AbstractConnector: Started [email protected]:4041
16/05/09 13:15:21 INFO util.Utils: Successfully started service 'SparkUI' on port 4041.
16/05/09 13:15:21 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://bigaperf116.svl.ibm.com:4041
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
no issues
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-2-0-issue-with-yarn-td17440.html

A temporary solution:
set yarn.timeline-service.enabled false to turn off ATS .

2.bad substitution

diagnostics: Application application_1441066518301_0013 failed 2 times due to AM Container for appattempt_1441066518301_0013_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://localhost:8088/cluster/app/application_1441066518301_0013Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e03_1441066518301_0013_02_000001
Exit code: 1
Exception message: /mnt/yarn/nm/local/usercache/stack/appcache/
application_1441066518301_0013/container_e03_1441066518301_0013_02_000001/
launch_container.sh: line 24:$PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:
/usr/hdp/current/hadoop-client/*::$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:
/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-.6.0.${hdp.version}.jar:
/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /mnt/yarn/nm/local/usercache/stack/appcache/application_1441066518301_0013/container_e03_1441066518301_0013_02_000001/launch_container.sh: line 24: $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

Solution:
此问题一般是由于手工安装组件而无法替换变量造成;
可修改 MapReduce2 组件配置项 mapreduce.application.classpath 中的 ${hdp.version} 为 hdp 绝对路径中的版本部分,eg. 2.4.0.0-169。

服务启动报错ulimit -c unlimited

resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usrp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usrp/current/hadoop-client/conf start namenode'' returned 1. -bash: line 0: ulimit: core file size: cannot modify limit: Operation not permitted
starting namenode, logging to ar/log/hadoopfs/hadoop-hdfs-namenode-wy1.jcloud.local.out

Solution:
CentOS7.1上启动HDFS的时候,在启动HDFS的namenode或者datanode的时候,非root启动的时候,会要求执行ulimit -c unlimited这个命令,但是执行的时候是su称hdfs帐号来启动,这时候因为hdfs帐号没有权限执行这个命令,所以会导致HDFS的namenode或者datanode启动失败,处理这个问题有一个办法就是改Ambari的代码,让HDFS启动过程不要执行ulimit -c unlimited命令。
需要修改的代码是:
编辑文件:

/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py

把这一行:

cmd = format("{ulimit_cmd} {hadoop_daemon} —config {hadoop_conf_dir} {action} {name}")

中的{ulimit_cmd}删除掉,删除之后重启Ambari-agent即可。

注册host报错

ERROR 2016-08-01 13:33:38,932 main.py:309 - Fatal exception occurred:
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 306, in
main(heartbeat_stop_callback)
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 242, in main
stop_agent()
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 189, in stop_agent
sys.exit(1)
SystemExit: 1

Solution:
这是因为ambari默认用的ascii编码,如果你用中文版操作系统,可以在/usr/lib/python2.6/site-packages/ambari_agent/main.py 文件开头添加

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

然后再retry failed就可以了

如何删除Ambari已有的服务

自定义服务SAMPLE后发现8080 web页面中没有删除的方法
Solution:

  1. 停止服务
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo": {"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}' http://localhost:8080/api/v1/clusters/hadoop/services/SAMPLE

SAMPLE服务因为实际上没干任何事,短暂时间后可能会自己又启动,所以手速要快

  1. 删除服务(快速立即执行)
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://localhost:8080/api/v1/clusters/hadoop/services/SAMPLE

如果没有停止的话会出现

{
"status" : 500,
"message" : "org.apache.ambari.server.controller.spi.SystemException: An internal system exception occurred: Cannot remove hadoop/SAMPLE. MYMASTER is in anon-removable state."
}

没关系再次执行就好

  1. 验证
    重新访问8080 web页面,已经发现那个SAMPLE service已经消失了
  2. 再举几个例子:
    remove a host components from a host
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X DELETE 'localhost:8080/api/v1/clusters/blueCluster/hosts/elk2.jcloud.local/host_components/FLUME_HANDLER'
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X DELETE 'localhost:8080/api/v1/clusters/cluster/hosts/ochadoop10/host_components/NAMENODE'
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X DELETE 'localhost:8080/api/v1/clusters/hbcm_ocdp/hosts/hbom-if-58/host_components/YARN_CLIENT'

install the components

curl -u admin:admin -i -H "X-Requested-By:ambari" -X POST 'localhost:8080/api/v1/clusters/hbcm_ocdp/hosts/hbbdc-dn-09/host_components/PHOENIX_QUERY_SERVER'
curl -u admin:admin -i -H "X-Requested-By:ambari" -X PUT 'localhost:8080/api/v1/clusters/hbcm_ocdp/hosts/hbbdc-dn-09/host_components/PHOENIX_QUERY_SERVER' -d '{"HostRoles": {"state": "INSTALLED"}}'

如何重置ambari的管理员密码

要想使用Ambari admin登陆,可以用以下办法重置admin的密码:

  1. Stop Ambari server
  2. Log on to ambari server host shell
  3. Run 'psql -U ambari ambari'
  4. Enter password **** (这是ambari连接到数据库时用的密码,默认是bigdata, 竟然以明文的形式,存储在文件/etc/ambari-server/conf/password.dat)
  5. In psql:
    update ambari.users set
    user_password='538916f8943ec225d97a9a86a2c6ec0818c1cd400e09e03b660fdaaec4af29ddbb6f2b1033b81b00'
    where user_name='admin';
  6. Quit psql: ctrl+D
  7. Run 'ambari-server restart'

User [dr.who] is not authorized to view the logs for application

在hadoop集群启用权限控制后,发现job运行日志的ui访问不了, User [dr.who] is not authorized to view the logs for application
Reason:
Resource Manager UI的默认用户dr.who权限不正确
Solution:
如果集群使用Ambari管理的话,在HDFS > Configurations > Custom core-site > Add Property
hadoop.http.staticuser.user=yarn
后台脚本修改配置:
获取配置信息:

/var/lib/ambari-server/resources/scripts/configs.sh get localhost hdp_cluster  hive-site|grep hive.server2.authenticatio
"hive.server2.authentication" : "NONE",
"hive.server2.authentication.spnego.keytab" : "HTTP/[email protected]",
"hive.server2.authentication.spnego.principal" : "/etc/security/keytabs/spnego.service.keytab",

修改配置信息:

/var/lib/ambari-server/resources/scripts/configs.sh set localhost hdp_cluster  hive-site hive.server2.authentication LDAP

ambari-sudo.sh /usr/bin/hdp-select错误

ambari-sudo.sh /usr/bin/hdp-select set all `ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.4.0.0-169 | tail -1`'] {'only_if': 'ls -d /usr/hdp/2.4.0.0-169*

Solution:

  1. What happens when you run "hdp-select versions" from the command line, as root? Does it return your current 2.4 version number? If not, inspect your /usr/hdp and make sure you have only "current" and the directories named after your versions (2.4 and older ones if you did an upgrade) there. If you have any other file there, delete it, and retry, first "hdp-select versions" and then ATS.
  2. go to /usr/bin/
    vi hdp-select
    def printVersions():
......
......
 -    if f not in [".", "..", "current", "share", "lost+found"]:
 +    if f not in [".", "..", "current", "share", "lost+found","hadoop"]:
......
  1. 软连接冲突,删除多余软连接重试

HiveMetaStore or Hiveserver fails to come up

SupportKBSYMPTOMHiveServer2 fails to come up and error similar to the following is reported in hiveserver2.log file

2015-11-18 20:47:19,965 WARN  [main]: server.HiveServer2 (HiveServer2.java:startHiveServer2(442)) - Error starting HiveServer2 on attempt 4, will retry in 60 secondsorg.apache.hive.service.ServiceException: Failed to Start HiveServer2        
   at org.apache.hive.service.CompositeService.start(CompositeService.java:80)        
   at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:366)        
   at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:412)        
   at org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:78)        
   at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:654)        
   at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:527)        
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)        
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        
   at java.lang.reflect.Method.invoke(Method.java:497)        
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)        
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hive.service.ServiceException: Unable to connect to MetaStore!        
   at org.apache.hive.service.cli.CLIService.start(CLIService.java:154)        
   at org.apache.hive.service.CompositeService.start(CompositeService.java:70)        ... 11 more
Caused by: MetaException(message:Got exception: org.apache.hadoop.hive.metastore.api.MetaException javax.jdo.JDOException: Exception thrown when executing query        
   at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)        
   at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:230)        
   at org.apache.hadoop.hive.metastore.ObjectStore.getDatabases(ObjectStore.java:701)        
   at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)        
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        
   at java.lang.reflect.Method.invoke(Method.java:497)        
   at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)        
   at com.sun.proxy.$Proxy7.getDatabases(Unknown Source)        
   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_databases(HiveMetaStore.java:1158)        
   at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)        
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

HiveMetaStore fails to come up

2017-02-27 14:45:05,361 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5908)) - Starting hive metastore on port 9083
2017-02-27 14:45:05,472 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(590)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2017-02-27 14:45:05,497 INFO  [main]: metastore.ObjectStore (ObjectStore.java:initialize(294)) - ObjectStore, initialize called
2017-02-27 14:45:06,193 ERROR [main]: DataNucleus.Datastore (Log4JLogger.java:error(115)) - Error : An error occurred trying to instantiate an instance of the adapter "org.datanucleus.store.rdbms.adapter.SQLAnywhereAdapter" for this JDBC driver : Class "org.datanucleus.store.rdbms.adapter.SQLAnywhereAdapter" was not found in the CLASSPATH. Please check your specification and your CLASSPATH.
Class "org.datanucleus.store.rdbms.adapter.SQLAnywhereAdapter" was not found in the CLASSPATH. Please check your specification and your CLASSPATH.
org.datanucleus.exceptions.ClassNotResolvedException: Class "org.datanucleus.store.rdbms.adapter.SQLAnywhereAdapter" was not found in the CLASSPATH. Please check your specification and your CLASSPATH.
   at org.datanucleus.ClassLoaderResolverImpl.classForName(ClassLoaderResolverImpl.java:216)
   at org.datanucleus.ClassLoaderResolverImpl.classForName(ClassLoaderResolverImpl.java:368)
   at org.datanucleus.ClassLoaderResolverImpl.classForName(ClassLoaderResolverImpl.java:391)
   at org.datanucleus.store.rdbms.adapter.DatastoreAdapterFactory.getAdapterClass(DatastoreAdapterFactory.java:226)
   at org.datanucleus.store.rdbms.adapter.DatastoreAdapterFactory.getNewDatastoreAdapter(DatastoreAdapterFactory.java:144)
   at org.datanucleus.store.rdbms.adapter.DatastoreAdapterFactory.getDatastoreAdapter(DatastoreAdapterFactory.java:92)
   at org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:309)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConst

ROOT CAUSE
AMBARI-12947 , BUG-44352
Post Ambari 2.1, up to Ambari 2.1.2, its mandatory to initialize datanucleus.rdbms.datastoreAdapterClassName in Hive Configs. This is
required only if SqlAnywhere database is used. There is no option in Ambari to delete this parameter.
RESOLUTION
Upgrade to Ambari 2.1.2.
WORKAROUND
Remove Hive configuration parameter 'datanucleus.rdbms.datastoreAdapterClassName' from hive-site using configs.sh
For eg

  1. Dump the hive-site parameters to a file
    /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin get Ambari_Hostname Ambari_ClusterName hive-site > /tmp/hive-site.txt
    This would dump/redirect all Ambari Hive configs parameter to /tmp/hive-site.txt
  2. Edit the /tmp/hive-site.txt template file created above and remove 'datanucleus.rdbms.datastoreAdapterClassname'. Also remove the
    lines before the 'properties' tag
  3. Set the hive-site parameters using /tmp/hive-site.txt
    /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin set Ambari_Hostname Ambari_ClusterName hive-site /tmp/hive-site.txt
  4. Start Hive Services
    This article created by Hortonworks Support (Article: 000003468) on 2015-11-25 06:07
    OS: Linux
    Type: Cluster_Administration
    Version: 2.1.0, 2.3.0
    Support ID: 000003468
    https://issues.apache.org/jira/browse/AMBARI-13114

你可能感兴趣的:(Ambari Issue Highlights)