安装过程中,由于网络终端,导致下面问题:
问题1:安装停止在获取安装锁
/tmp/scm_prepare_node.tYlmPfrT
usingSSH_CLIENT to get the SCM hostname: 172.16.77.20 33950 22
opening logging file descriptor
正在启动安装脚本...正在获取安装锁...BEGIN flock 4
这段大概过了半个小时,一次卸载,一次等了快1个小时,终于过去了,
问题2:不能选择主机
安装失败了,重新不能选主机
图1
解决方案,需要清理安装失败文件
卸载 Cloudera Manager 5.1.x.和 相关软件【官网翻译:高可用】
问题3:DNS反向解析PTR localhost:
描述:
DNS反向解析错误,不能正确解析Cloudera Manager Server主机名
日志:
Detecting Cloudera Manager Server...
Detecting Cloudera Manager Server...
BEGIN host -t PTR 192.168.1.198
198.1.168.192.in-addr.arpa domain name pointerlocalhost.
END (0)
using localhost as scm server hostname
BEGIN which python
/usr/bin/python
END (0)
BEGIN python -c 'import socket; import sys; s = socket.socket(socket.AF_INET);s.settimeout(5.0); s.connect((sys.argv[1], int(sys.argv[2]))); s.close();'localhost 7182
Traceback (most recent call last):
File "
File "
socket.error: [Errno 111] Connection refused
END (1)
could not contact scm server at localhost:7182, giving up
waiting for rollback request
解决方案:
将连不上的机器 /usr/bin/host 文件删掉,执行下面命令:
复制代码
说明:
不明白cloudera的初衷,这里已经得到 ClouderaManager Server的ip了,却还要把ip解析成主机名来连接
由于DNS反向解析没有配置好,根据Cloudera ManagerServer 的ip解析主机名却得到了localhost,造成之后的连接错误
这里的解决方案是直接把/usr/bin/host删掉,这样ClouderaManager就会直接使用 ip进行连接,就没有错了
参考:
问题 4 NTP:
问题描述:
Bad Health --Clock Offset
The host's NTP service did not respond to a request forthe clock offset.
解决:
配置NTP服务
步骤参考:
CentOS配置NTP Server:
http://www.hailiangchen.com/centos-ntp/
国内常用NTP服务器地址及IP
http://www.douban.com/note/171309770/
修改配置文件:
[root@work03 ~]# vim /etc/ntp.conf
server s1a.time.edu.cn prefer
server s1b.time.edu.cn
server s1c.time.edu.cn
restrict 172.16.1.0 mask 255.255.255.0 nomodify <===放行局域网来源
启动ntp
#service ntpd restart <===启动ntp服务
客户端同步时间(work02,work03):
ntpdate work01
说明:NTP服务启动需要大约五分钟时间,服务启动之前,若客户端同步时间,则会出现错误“no server suitable for synchronization found”
定时同步时间:
在work02和 work03上配置crontab定时同步时间
crontab -e
00 12 * root /usr/sbin/ntpdate 192.168.56.121 >> /root/ntpdate.log2>&1
问题 2.2
描述:
Clock Offset
· Ensure that thehost's hostname is configured properly.
· Ensure that port7182 is accessible on the Cloudera Manager Server (check firewall rules).
· Ensure that ports9000 and 9001 are free on the host being added.
· Check agent logsin /var/log/cloudera-scm-agent/ on the host being added (some of the logs canbe found in the installation details).
问题定位:
在对应host(work02、work03)上运行 'ntpdc -c loopinfo'
[root@work03 work]# ntpdc -c loopinfo
ntpdc: read: Connection refused
解决:
开启ntp服务:
三台机器都开机启动 ntp服务
chkconfig ntpd on
问题 5 heartbeat:
错误信息:
Installation failed. Failed to receive heartbeat from agent.
解决:关闭防火墙
问题 6 Unknow Health:
Unknow Health
重启后:Request to theHost Monitor failed.
service --status-all| grep clo
机器上查看scm-agent状态:cloudera-scm-agentdead but pid file exists
解决:重启服务
service cloudera-scm-agent restart
service cloudera-scm-server restart
问题 7 canonial name hostnameconsistent:
Bad Health
The hostname and canonical name for this host are notconsistent when checked from a Java process.
canonical name:
4092 Monitor-HostMonitor throttling_loggerWARNING (29 skipped) hostname work02 differs from the canonical namework02.xinzhitang.com
解决:修改hosts 使FQDN和 hostname相同
ps:虽然解决了但是不明白为什么主机名和主机别名要一样
/etc/hosts
192.168.1.185 work01 work01
192.168.1.141 work02 work02
192.168.1.198 work03 work03
问题 8 Concerning Health:
Concerning Health Issue
-- Network Interface Speed --
描述:The host has 2 network interface(s) that appear to beoperating at less than full speed. Warning threshold: any.
详细:
This is a host health test that checks for networkinterfaces that appear to be operating at less than full speed.
A failure of this health test may indicate that network interface(s) may beconfigured incorrectly and may be causing performance problems. Use the ethtoolcommand to check and configure the host's network interfaces to use the fastestavailable link speed and duplex mode.
解决:
本次测试修改了 Cloudera Manager 的配置,应该不算是真正的解决
问题10 IOException thrown while collecting data from host: No route to host
原因:agent开启了防火墙
解决:service iptables stop
问题11
2、Clouderarecommendssetting /proc/sys/vm/swappiness to 0. Current setting is 60. Use thesysctlcommand to change this setting at runtime and edit /etc/sysctl.conf forthissetting to be saved after a reboot. You may continue with installation, butyoumay run into issues with Cloudera Manager reporting that your hostsareunhealthy because they are swapping. The following hosts are affected:
解决:
问题12 时钟不同步(同步至中科大时钟服务器202.141.176.110)
问题13 The host's NTPservice didnot respond to a request for the clock offset.
#service ntpdstart
问题14 The Cloudera ManagerAgentis not able to communicate with this role's web server.
一种原因是元数据数据库无法连接,请检查数据库配置:
问题15 Hive MetastoreServer无法启动,修改Hive元数据数据库配置(当我们修改主机名后即应修改元数据数据库配置):
问题排查方式
一般的错误,查看错误输出,按照关键字google
异常错误(如namenode、datanode莫名其妙挂了):查看hadoop($HADOOP_HOME/logs)或hive日志
hadoop错误
问题16 datanode无法正常启动
添加datanode后,datanode无法正常启动,进程一会莫名其妙挂掉,查看namenode日志显示如下:
Text代码
2013-06-21 18:53:39,182 FATALorg.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data nodex.x.x.x:50010 is attempting to report storage ID DS-1357535176-x.x.x.x-50010-1371808472808.Node y.y.y.y:50010 is expected to serve this storage.
原因分析:
拷贝hadoop安装包时,包含data与tmp文件夹(见本人《hadoop安装》一文),未成功格式化datanode
解决办法:
Shell代码
rm -rf /data/hadoop/hadoop-1.1.2/data
rm -rf /data/hadoop/hadoop-1.1.2/tmp
hadoop datanode -format
问题17 safe mode
Text代码
2013-06-2010:35:43,758 ERROR org.apache.hadoop.security.UserGroupInformation:PriviledgedActionException as:hadoopcause:org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renewlease for DFSClient_hb_rs_wdev1.corp.qihoo.net,60020,1371631589073. Name nodeis in safe mode.
解决方案:
Shell代码
hadoopdfsadmin -safemode leave
问题18 连接异常
Text代码
2013-06-21 19:55:05,801 WARNorg.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call tohomename/x.x.x.x:9000 failed on local exception: java.io.EOFException
可能原因:
namenode监听127.0.0.1:9000,而非0.0.0.0:9000或外网IP:9000
iptables限制
解决方案:
检查/etc/hosts配置,使得hostname绑定到非127.0.0.1的IP上
iptables放开端口
问题19 namenode id
Text代码
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:java.io.IOException: Incompatible namespaceIDs in/var/lib/hadoop-0.20/cache/hdfs/dfs/data: namenode namespaceID = 240012870;datanode namespaceID = 1462711424 .
问题:Namenode上namespaceID与datanode上namespaceID不一致。
问题产生原因:每次namenode format会重新创建一个namenodeId,而tmp/dfs/data下包含了上次format下的id,namenode format清空了namenode下的数据,但是没有清空datanode下的数据,所以造成namenode节点上的namespaceID与 datanode节点上的namespaceID不一致。启动失败。
解决办法:参考该网址http://blog.csdn.net/wh62592855/archive/2010/07/21/5752199.aspx 给出两种解决方法,我们使用的是第一种解决方法:即:
(1)停掉集群服务
(2)在出问题的datanode节点上删除data目录,data目录即是在hdfs-site.xml文件中配置的 dfs.data.dir目录,本机器上那个是/var/lib/hadoop-0.20/cache/hdfs/dfs/data/(注:我们当时在所有的datanode和namenode节点上均执行了该步骤。以防删掉后不成功,可以先把data目录保存一个副本).
(3)格式化namenode.
(4)重新启动集群。
问题解决。
这种方法带来的一个副作用即是,hdfs上的所有数据丢失。如果hdfs上存放有重要数据的时候,不建议采用该方法,可以尝试提供的网址中的第二种方法。
问题20 目录权限
start-dfs.sh执行无错,显示启动datanode,执行完后无datanode。查看datanode机器上的日志,显示因dfs.data.dir目录权限不正确导致:
Text代码
expected: drwxr-xr-x,current:drwxrwxr-x
解决办法:
查看dfs.data.dir的目录配置,修改权限即可。
hive错误
问题21 NoClassDefFoundError
Could not initialize class java.lang.NoClassDefFoundError: Could not initializeclass org.apache.hadoop.hbase.io.HbaseObjectWritable
将protobuf-***.jar添加到jars路径
Xml代码
//$HIVE_HOME/conf/hive-site.xml
hive.aux.jars.path
file:///data/hadoop/hive-0.10.0/lib/hive-hbase-handler-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/hbase-0.94.8.jar,file:///data/hadoop/hive-0.10.0/lib/zookeeper-3.4.5.jar,file:///data/hadoop/hive-0.10.0/lib/guava-r09.jar,file:///data/hadoop/hive-0.10.0/lib/hive-contrib-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/protobuf-java-2.4.0a.jar
问题22 hive动态分区异常
[Fatal Error] Operator FS_2 (id=2): Number of dynamic partitions exceededhive.exec.max.dynamic.partitions.pernode
Shell代码
hive> sethive.exec.max.dynamic.partitions.pernode = 10000;
问题23 mapreduce进程超内存限制——hadoop Java heap space
vim mapred-site.xml添加:
Xml代码
//mapred-site.xml
mapred.child.java.opts
-Xmx2048m
Shell代码
#$HADOOP_HOME/conf/hadoop_env.sh
exportHADOOP_HEAPSIZE=5000
问题24 hive文件数限制
[Fatal Error] total number of created files now is 100086, which exceeds 100000
Shell代码
hive> sethive.exec.max.created.files=655350;
问题25 hive 5.metastore连接超时
Text代码
FAILED:SemanticException org.apache.thrift.transport.TTransportException:java.net.SocketTimeoutException: Read timed out
解决方案:
Shell代码
hive>set hive.metastore.client.socket.timeout=500;
问题26 hive 6.java.io.IOException: error=7, Argument list too long
Text代码
Task withthe most failures(5):
Task ID:
task_201306241630_0189_r_000009
URL:
http://namenode.godlovesdog.com:50030/taskdetails.jsp?jobid=job_201306241630_0189&tipid=task_201306241630_0189_r_000009
DiagnosticMessages for this Task:
java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whileprocessing row (tag=0){"key":{"reducesinkkey0":"164058872","reducesinkkey1":"djh,S1","reducesinkkey2":"20130117170703","reducesinkkey3":"xxx"},"value":{"_col0":"1","_col1":"xxx","_col2":"20130117170703","_col3":"164058872","_col4":"xxx,S1"},"alias":0}
atorg.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:520)
atorg.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
atorg.apache.hadoop.mapred.Child$4.run(Child.java:255)
atjava.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
atorg.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by:org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whileprocessing row (tag=0){"key":{"reducesinkkey0":"164058872","reducesinkkey1":"xxx,S1","reducesinkkey2":"20130117170703","reducesinkkey3":"xxx"},"value":{"_col0":"1","_col1":"xxx","_col2":"20130117170703","_col3":"164058872","_col4":"djh,S1"},"alias":0}
atorg.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:258)
... 7 more
Caused by:org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20000]: Unable toinitialize custom script.
atorg.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:354)
atorg.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
atorg.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
atorg.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
atorg.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
atorg.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249)
... 7 more
Caused by:java.io.IOException: Cannot run program "/usr/bin/python2.7":error=7, 参数列表过长
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)
atorg.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:313)
... 15 more
Caused by:java.io.IOException: error=7, 参数列表过长
atjava.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:135)
atjava.lang.ProcessImpl.start(ProcessImpl.java:130)
atjava.lang.ProcessBuilder.start(ProcessBuilder.java:1023)
... 16 more
FAILED:Execution Error, return code 20000 fromorg.apache.hadoop.hive.ql.exec.MapRedTask. Unable to initialize custom script.
解决方案:
升级内核或减少分区数https://issues.apache.org/jira/browse/HIVE-2372
问题27 hive 6.runtime error
Shell代码
hive> show tables;
FAILED: Error in metadata: java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 fromorg.apache.hadoop.hive.ql.exec.DDLTask
问题排查:
Shell代码
hive -hiveconf hive.root.logger=DEBUG,console
Text代码
13/07/15 16:29:24 INFO hive.metastore: Trying to connectto metastore with URI thrift://xxx.xxx.xxx.xxx:9083
13/07/15 16:29:24 WARN hive.metastore: Failed to connectto the MetaStore Server...
org.apache.thrift.transport.TTransportException:java.net.ConnectException: 拒绝连接
。。。
MetaException(message:Could not connect to meta storeusing any of the URIs provided. Most recent failure:org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接
尝试连接9083端口,netstat查看该端口确实没有被监听,第一反应是hiveserver没有正常启动。查看hiveserver进程却存在,只是监听10000端口。
查看hive-site.xml配置,hive客户端连接9083端口,而hiveserver默认监听10000,找到问题根源了
解决办法:
Shell代码
hive --service hiveserver -p 9083
//或修改$HIVE_HOME/conf/hive-site.xml的hive.metastore.uris部分
//将端口改为10000
using /usr/lib/hive as HIVE_HOME
using /var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTOREas HIVE_CONF_DIR
using /usr/lib/hadoop as HADOOP_HOME
using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR
ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.
Wed Oct 22 18:48:53 CST 2014
JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME
using 5 as CDH_VERSION
using /usr/lib/hive as HIVE_HOME
using /var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTOREas HIVE_CONF_DIR
using /usr/lib/hadoop as HADOOP_HOME
using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR
ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.
Wed Oct 22 18:48:55 CST 2014
JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME
using 5 as CDH_VERSION
using /usr/lib/hive as HIVE_HOME
using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE as HIVE_CONF_DIR
using /usr/lib/hadoop as HADOOP_HOME
using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR
ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.
Wed Oct 22 18:48:58 CST 2014
JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME
using 5 as CDH_VERSION
using /usr/lib/hive as HIVE_HOME
using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE as HIVE_CONF_DIR
using /usr/lib/hadoop as HADOOP_HOME
using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR
ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.
JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME
using 5 as CDH_VERSION
using /usr/lib/hive as HIVE_HOME
using /var/run/cloudera-scm-agent/process/212-hive-metastore-create-tables as HIVE_CONF_DIR
using /usr/lib/hadoop as HADOOP_HOME
using /var/run/cloudera-scm-agent/process/212-hive-metastore-create-tables/yarn-conf as HADOOP_CONF_DIR
ERROR: Failed to find hive-hbase storage handler jars to add in hive-site.xml. Hive queries that use Hbase storage handler may not work until this is fixed.
查看 /usr/lib/hive 是否正常
正常的
下午3点21:09.801 FATAL org.apache.hadoop.hbase.master.HMaster
Unhandled exception. Starting shutdown.
java.io.IOException: error or interruptedwhile splitting logs in[hdfs://master:8020/hbase/WALs/slave2,60020,1414202360923-splitting] Task =installed = 2 done = 1 error = 1
atorg.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:362)
atorg.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
atorg.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1070)
atorg.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:854)
atorg.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)
atjava.lang.Thread.run(Thread.java:744)
下午3点46:12.903 FATAL org.apache.hadoop.hbase.master.HMaster
Unhandled exception. Starting shutdown.
java.io.IOException: error or interruptedwhile splitting logs in[hdfs://master:8020/hbase/WALs/slave2,60020,1414202360923-splitting] Task =installed = 1 done = 0 error = 1
atorg.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:362)
atorg.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
atorg.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1070)
atorg.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:854)
atorg.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)
atjava.lang.Thread.run(Thread.java:744)
解决方法:
在hbase-site.xml加入一条,让启动hbase集群时不做hlog splitting
[root@master ~]# hadoop fs -mv/hbase/WALs/slave2,60020,1414202360923-splitting/ /test
[root@master ~]# hadoop fs -ls /test
2014-10-28 14:31:32,879 INFO[hconnection-0xd18e8a7-shared--pool2-t224] (AsyncProcess.java:673) - #3,table=session_service_201410210000_201410312359, attempt=14/35 failed 1383 ops,last exception: org.apache.hadoop.hbase.RegionTooBusyException:org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit,regionName=session_service_201410210000_201410312359,7499999991,1414203068872.08ee7bb71161cb24e18ddba4c14da0f2.,server=slave1,60020,1414380404290, memstoreSize=271430320,blockingMemStoreSize=268435456
atorg.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2561)
atorg.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:1963)
at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4050)
atorg.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3361)
atorg.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3265)
atorg.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26935)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
Exception
Description
ClockOutOfSyncException
当一个RegionServer始终偏移太大时,master节点结将会抛出此异常.
DoNotRetryIOException
用于提示不要再重试的异常子类: 如UnknownScannerException.
DroppedSnapshotException
如果在flush过程中快照内容并没有正确的存储到文件中时,该异常将被抛出.
HBaseIOException
所有hbase特定的IOExceptions都是HBaseIOException类的子类.
InvalidFamilyOperationException
Hbase接收修改表schema的请求,但请求中对应的列族名无效.
MasterNotRunningException
master节点没有运行的异常
NamespaceExistException
已存在某namespace的异常
NamespaceNotFoundException
找不到该namsespace的异常
NotAllMetaRegionsOnlineException
某操作需要所有root及meta节点同时在线,但实际情况不满足该操作要求
NotServingRegionException
向某RegionServer发送访问请求,但是它并没有反应或该region不可用.
PleaseHoldException
当某个ResionServer宕掉并由于重启过快而导致master来不及处理宕掉之前的server实例, 或者用户调用admin级操作时master正处于初始化状态时, 或者在正在启动的RegionServer上进行操作时都会抛出此类异常.
RegionException
访问region时出现的异常.
RegionTooBusyException
RegionServer处于繁忙状态并由于阻塞而等待提供服务的异常.
TableExistsException
已存在某表的异常
TableInfoMissingException
在table目录下无法找到.tableinfo文件的异常
TableNotDisabledException
某个表没有正常处于禁用状态的异常
TableNotEnabledException
某个表没有正常处于启用状态的异常
TableNotFoundException
无法找到某个表的异常
UnknownRegionException
访问无法识别的region引起的异常.
UnknownScannerException
向RegionServer传递了无法识别的scanner id的异常.
YouAreDeadException
当一个RegionServer报告它已被处理为dead状态,由master抛出此异常.
ZooKeeperConnectionException
客户端无法连接到zookeeper的异常.
INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher
Waited 90779ms on a compaction to clean up 'too many store files'; waited long enough... proceeding with flush of session_service_201410210000_201410312359,7656249951,1414481868315.bbf0a49fb8a9b650a584769ddd1fdd89.
MemStoreFlusher实例生成时会启动MemStoreFlusher.FlushHandler线程实例,
此线程个数通过hbase.hstore.flusher.count配置,默认为1
一台机器硬盘满,一台机器硬盘不满的情况:
群集中有 26,632 个副本不足的块块。群集中共有 84,822 个块。百分比 副本不足的块: 31.40%。 警告阈值:10.00%。
群集中有 27,278 个副本不足的块块。群集中共有 85,476 个块。百分比 副本不足的块: 31.91%。 警告阈值:10.00%。
下午4点08:53.847
INFO
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher
Flushed, sequenceid=45525, memsize=124.2 M, hasBloomFilter=true, into tmp file hdfs://master:8020/hbase/data/default/session_service_201410260000_201410312359/a3b64675b0069b8323665274e2f95cdc/.tmp/b7fa4f5f85354ecc96aa48a09081f786
下午4点08:53.862
INFO
org.apache.hadoop.hbase.regionserver.HStore
Added hdfs://master:8020/hbase/data/default/session_service_201410260000_201410312359/a3b64675b0069b8323665274e2f95cdc/f/b7fa4f5f85354ecc96aa48a09081f786, entries=194552, sequenceid=45525, filesize=47.4 M
下午4点09:00.378
WARN
org.apache.hadoop.ipc.RpcServer
(responseTooSlow): {"processingtimems":39279,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"192.168.5.9:41284","starttimems":1414656501099,"queuetimems":0,"class":"HRegionServer","responsesize":16,"method":"Scan"}
下午4点09:00.379
WARN
org.apache.hadoop.ipc.RpcServer
RpcServer.respondercallId: 33398 service: ClientService methodName: Scan size: 209 connection: 192.168.5.9:41284: output error
下午4点09:00.380
WARN
org.apache.hadoop.ipc.RpcServer
RpcServer.handler=79,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
下午4点09:00.381
INFO
org.apache.hadoop.hbase.regionserver.HRegion
Finished memstore flush of ~128.1 M/134326016, currentsize=2.4 M/2559256 for region session_service_201410260000_201410312359,6406249959,1414571385831.a3b64675b0069b8323665274e2f95cdc. in 8133ms, sequenceid=45525, compaction requested=false
下
转载于:https://blog.51cto.com/ybs000/2121375