hadoop hive 常见问题解决持续更新

安装过程中,由于网络终端,导致下面问题:

问题1:安装停止在获取安装锁

/tmp/scm_prepare_node.tYlmPfrT

usingSSH_CLIENT to get the SCM hostname: 172.16.77.20 33950 22
opening logging file descriptor

正在启动安装脚本...正在获取安装锁...BEGIN flock 4

这段大概过了半个小时,一次卸载,一次等了快1个小时,终于过去了,

问题2:不能选择主机

安装失败了,重新不能选主机

图1
解决方案,需要清理安装失败文件
卸载 Cloudera Manager 5.1.x.和 相关软件【官网翻译:高可用】

问题3:DNS反向解析PTR localhost:

描述:

DNS反向解析错误,不能正确解析Cloudera Manager Server主机名
日志:

Detecting Cloudera Manager Server...
Detecting Cloudera Manager Server...
BEGIN host -t PTR 192.168.1.198
198.1.168.192.in-addr.arpa domain name pointerlocalhost.
END (0)
using localhost as scm server hostname
BEGIN which python
/usr/bin/python
END (0)
BEGIN python -c 'import socket; import sys; s = socket.socket(socket.AF_INET);s.settimeout(5.0); s.connect((sys.argv[1], int(sys.argv[2]))); s.close();'localhost 7182
Traceback (most recent call last):
File "", line 1, in
File "", line 1, in connect
socket.error: [Errno 111] Connection refused
END (1)
could not contact scm server at localhost:7182, giving up
waiting for rollback request

解决方案:

将连不上的机器 /usr/bin/host 文件删掉,执行下面命令:

  1. sudo mv/usr/bin/host /usr/bin/host.bak

复制代码

说明:

不明白cloudera的初衷,这里已经得到 ClouderaManager Server的ip了,却还要把ip解析成主机名来连接

由于DNS反向解析没有配置好,根据Cloudera ManagerServer 的ip解析主机名却得到了localhost,造成之后的连接错误

这里的解决方案是直接把/usr/bin/host删掉,这样ClouderaManager就会直接使用 ip进行连接,就没有错了

参考:

问题 4 NTP:

问题描述:

Bad Health --Clock Offset

The host's NTP service did not respond to a request forthe clock offset.

解决:

配置NTP服务

步骤参考:

CentOS配置NTP Server:

http://www.hailiangchen.com/centos-ntp/

国内常用NTP服务器地址及IP

http://www.douban.com/note/171309770/

修改配置文件:
[root@work03 ~]# vim /etc/ntp.conf

Use public servers from the pool.ntp.org project.

Please consider joining the pool (http://www.pool.ntp.org/join.html).

server s1a.time.edu.cn prefer

server s1b.time.edu.cn

server s1c.time.edu.cn

restrict 172.16.1.0 mask 255.255.255.0 nomodify <===放行局域网来源

启动ntp
#service ntpd restart <===启动ntp服务
客户端同步时间(work02,work03):
ntpdate work01
说明:NTP服务启动需要大约五分钟时间,服务启动之前,若客户端同步时间,则会出现错误“no server suitable for synchronization found”
定时同步时间:
在work02和 work03上配置crontab定时同步时间

crontab -e
00 12 * root /usr/sbin/ntpdate 192.168.56.121 >> /root/ntpdate.log2>&1
问题 2.2
描述:
Clock Offset

· Ensure that thehost's hostname is configured properly.

· Ensure that port7182 is accessible on the Cloudera Manager Server (check firewall rules).

· Ensure that ports9000 and 9001 are free on the host being added.

· Check agent logsin /var/log/cloudera-scm-agent/ on the host being added (some of the logs canbe found in the installation details).

问题定位:

在对应host(work02、work03)上运行 'ntpdc -c loopinfo'
[root@work03 work]# ntpdc -c loopinfo
ntpdc: read: Connection refused

解决:

开启ntp服务:
三台机器都开机启动 ntp服务
chkconfig ntpd on

问题 5 heartbeat:
错误信息:

Installation failed. Failed to receive heartbeat from agent.

解决:关闭防火墙

问题 6 Unknow Health:
Unknow Health
重启后:Request to theHost Monitor failed.
service --status-all| grep clo
机器上查看scm-agent状态:cloudera-scm-agentdead but pid file exists
解决:重启服务
service cloudera-scm-agent restart

service cloudera-scm-server restart

问题 7 canonial name hostnameconsistent:
Bad Health

The hostname and canonical name for this host are notconsistent when checked from a Java process.

canonical name:

4092 Monitor-HostMonitor throttling_loggerWARNING (29 skipped) hostname work02 differs from the canonical namework02.xinzhitang.com

解决:修改hosts 使FQDN和 hostname相同

ps:虽然解决了但是不明白为什么主机名和主机别名要一样

/etc/hosts

192.168.1.185 work01 work01

192.168.1.141 work02 work02

192.168.1.198 work03 work03

问题 8 Concerning Health:
Concerning Health Issue

-- Network Interface Speed --

描述:The host has 2 network interface(s) that appear to beoperating at less than full speed. Warning threshold: any.

详细:

This is a host health test that checks for networkinterfaces that appear to be operating at less than full speed.
A failure of this health test may indicate that network interface(s) may beconfigured incorrectly and may be causing performance problems. Use the ethtoolcommand to check and configure the host's network interfaces to use the fastestavailable link speed and duplex mode.

解决:

本次测试修改了 Cloudera Manager 的配置,应该不算是真正的解决

问题10 IOException thrown while collecting data from host: No route to host
原因:agent开启了防火墙

解决:service iptables stop

问题11
2、Clouderarecommendssetting /proc/sys/vm/swappiness to 0. Current setting is 60. Use thesysctlcommand to change this setting at runtime and edit /etc/sysctl.conf forthissetting to be saved after a reboot. You may continue with installation, butyoumay run into issues with Cloudera Manager reporting that your hostsareunhealthy because they are swapping. The following hosts are affected:

解决:

echo 0>/proc/sys/vm/swappiness (toapply for now)

sysctl-wvm.swappiness=0 (to makethis persistentacross reboots)

问题12 时钟不同步(同步至中科大时钟服务器202.141.176.110)

echo "0 3 * **/usr/sbin/ntpdate 202.141.176.110;/sbin/hwclock–w">>/var/spool/cron/root

service crondrestart

ntpdate202.141.176.110

问题13 The host's NTPservice didnot respond to a request for the clock offset.
#service ntpdstart

ntpdc -cloopinfo (thehealth will be good if this command executed successfully)

问题14 The Cloudera ManagerAgentis not able to communicate with this role's web server.
一种原因是元数据数据库无法连接,请检查数据库配置:

问题15 Hive MetastoreServer无法启动,修改Hive元数据数据库配置(当我们修改主机名后即应修改元数据数据库配置):

问题排查方式

一般的错误,查看错误输出,按照关键字google
异常错误(如namenode、datanode莫名其妙挂了):查看hadoop($HADOOP_HOME/logs)或hive日志

hadoop错误

问题16 datanode无法正常启动

添加datanode后,datanode无法正常启动,进程一会莫名其妙挂掉,查看namenode日志显示如下:

Text代码

2013-06-21 18:53:39,182 FATALorg.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data nodex.x.x.x:50010 is attempting to report storage ID DS-1357535176-x.x.x.x-50010-1371808472808.Node y.y.y.y:50010 is expected to serve this storage.

原因分析:
拷贝hadoop安装包时,包含data与tmp文件夹(见本人《hadoop安装》一文),未成功格式化datanode
解决办法:

Shell代码

rm -rf /data/hadoop/hadoop-1.1.2/data

rm -rf /data/hadoop/hadoop-1.1.2/tmp

hadoop datanode -format

问题17 safe mode
Text代码

2013-06-2010:35:43,758 ERROR org.apache.hadoop.security.UserGroupInformation:PriviledgedActionException as:hadoopcause:org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot renewlease for DFSClient_hb_rs_wdev1.corp.qihoo.net,60020,1371631589073. Name nodeis in safe mode.

解决方案:

     Shell代码

hadoopdfsadmin -safemode leave

问题18 连接异常
Text代码

2013-06-21 19:55:05,801 WARNorg.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call tohomename/x.x.x.x:9000 failed on local exception: java.io.EOFException

可能原因:

namenode监听127.0.0.1:9000,而非0.0.0.0:9000或外网IP:9000
iptables限制

解决方案:

检查/etc/hosts配置,使得hostname绑定到非127.0.0.1的IP上
iptables放开端口

问题19 namenode id
Text代码

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:java.io.IOException: Incompatible namespaceIDs in/var/lib/hadoop-0.20/cache/hdfs/dfs/data: namenode namespaceID = 240012870;datanode namespaceID = 1462711424 .

问题:Namenode上namespaceID与datanode上namespaceID不一致。 

  问题产生原因:每次namenode format会重新创建一个namenodeId,而tmp/dfs/data下包含了上次format下的id,namenode format清空了namenode下的数据,但是没有清空datanode下的数据,所以造成namenode节点上的namespaceID与 datanode节点上的namespaceID不一致。启动失败。

  解决办法:参考该网址http://blog.csdn.net/wh62592855/archive/2010/07/21/5752199.aspx 给出两种解决方法,我们使用的是第一种解决方法:即:

  (1)停掉集群服务

  (2)在出问题的datanode节点上删除data目录,data目录即是在hdfs-site.xml文件中配置的 dfs.data.dir目录,本机器上那个是/var/lib/hadoop-0.20/cache/hdfs/dfs/data/(注:我们当时在所有的datanode和namenode节点上均执行了该步骤。以防删掉后不成功,可以先把data目录保存一个副本).

  (3)格式化namenode.

  (4)重新启动集群。

  问题解决。
这种方法带来的一个副作用即是,hdfs上的所有数据丢失。如果hdfs上存放有重要数据的时候,不建议采用该方法,可以尝试提供的网址中的第二种方法。

问题20 目录权限

start-dfs.sh执行无错,显示启动datanode,执行完后无datanode。查看datanode机器上的日志,显示因dfs.data.dir目录权限不正确导致:

Text代码

expected: drwxr-xr-x,current:drwxrwxr-x

解决办法:
查看dfs.data.dir的目录配置,修改权限即可。

hive错误

问题21 NoClassDefFoundError

Could not initialize class java.lang.NoClassDefFoundError: Could not initializeclass org.apache.hadoop.hbase.io.HbaseObjectWritable
将protobuf-***.jar添加到jars路径

      Xml代码

//$HIVE_HOME/conf/hive-site.xml

hive.aux.jars.path

file:///data/hadoop/hive-0.10.0/lib/hive-hbase-handler-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/hbase-0.94.8.jar,file:///data/hadoop/hive-0.10.0/lib/zookeeper-3.4.5.jar,file:///data/hadoop/hive-0.10.0/lib/guava-r09.jar,file:///data/hadoop/hive-0.10.0/lib/hive-contrib-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/protobuf-java-2.4.0a.jar

问题22 hive动态分区异常

[Fatal Error] Operator FS_2 (id=2): Number of dynamic partitions exceededhive.exec.max.dynamic.partitions.pernode

Shell代码

hive> sethive.exec.max.dynamic.partitions.pernode = 10000;

问题23 mapreduce进程超内存限制——hadoop Java heap space

vim mapred-site.xml添加:

     Xml代码

//mapred-site.xml

     mapred.child.java.opts

     -Xmx2048m

     Shell代码

#$HADOOP_HOME/conf/hadoop_env.sh

exportHADOOP_HEAPSIZE=5000

问题24 hive文件数限制

[Fatal Error] total number of created files now is 100086, which exceeds 100000

Shell代码

hive> sethive.exec.max.created.files=655350;

问题25 hive 5.metastore连接超时
Text代码

FAILED:SemanticException org.apache.thrift.transport.TTransportException:java.net.SocketTimeoutException: Read timed out

解决方案:

     Shell代码

hive>set hive.metastore.client.socket.timeout=500;

问题26 hive 6.java.io.IOException: error=7, Argument list too long
Text代码

Task withthe most failures(5):


Task ID:

task_201306241630_0189_r_000009

URL:

http://namenode.godlovesdog.com:50030/taskdetails.jsp?jobid=job_201306241630_0189&tipid=task_201306241630_0189_r_000009


DiagnosticMessages for this Task:

java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whileprocessing row (tag=0){"key":{"reducesinkkey0":"164058872","reducesinkkey1":"djh,S1","reducesinkkey2":"20130117170703","reducesinkkey3":"xxx"},"value":{"_col0":"1","_col1":"xxx","_col2":"20130117170703","_col3":"164058872","_col4":"xxx,S1"},"alias":0}

     atorg.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270)

     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:520)

     atorg.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)

     atorg.apache.hadoop.mapred.Child$4.run(Child.java:255)

     atjava.security.AccessController.doPrivileged(Native Method)

     at javax.security.auth.Subject.doAs(Subject.java:415)

     atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)

     atorg.apache.hadoop.mapred.Child.main(Child.java:249)

Caused by:org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whileprocessing row (tag=0){"key":{"reducesinkkey0":"164058872","reducesinkkey1":"xxx,S1","reducesinkkey2":"20130117170703","reducesinkkey3":"xxx"},"value":{"_col0":"1","_col1":"xxx","_col2":"20130117170703","_col3":"164058872","_col4":"djh,S1"},"alias":0}

     atorg.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:258)

     ... 7 more

Caused by:org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20000]: Unable toinitialize custom script.

     atorg.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:354)

     atorg.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)

     atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)

     atorg.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)

     atorg.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)

     atorg.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)

     atorg.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)

     at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)

     atorg.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249)

     ... 7 more

Caused by:java.io.IOException: Cannot run program "/usr/bin/python2.7":error=7, 参数列表过长

     at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)

     atorg.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:313)

     ... 15 more

Caused by:java.io.IOException: error=7, 参数列表过长

     atjava.lang.UNIXProcess.forkAndExec(Native Method)

     at java.lang.UNIXProcess.(UNIXProcess.java:135)

     atjava.lang.ProcessImpl.start(ProcessImpl.java:130)

     atjava.lang.ProcessBuilder.start(ProcessBuilder.java:1023)

     ... 16 more

FAILED:Execution Error, return code 20000 fromorg.apache.hadoop.hive.ql.exec.MapRedTask. Unable to initialize custom script.

解决方案:
升级内核或减少分区数https://issues.apache.org/jira/browse/HIVE-2372

问题27 hive 6.runtime error

Shell代码

hive> show tables;

FAILED: Error in metadata: java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

FAILED: Execution Error, return code 1 fromorg.apache.hadoop.hive.ql.exec.DDLTask

问题排查:

     Shell代码

hive -hiveconf hive.root.logger=DEBUG,console

     Text代码

13/07/15 16:29:24 INFO hive.metastore: Trying to connectto metastore with URI thrift://xxx.xxx.xxx.xxx:9083

13/07/15 16:29:24 WARN hive.metastore: Failed to connectto the MetaStore Server...

org.apache.thrift.transport.TTransportException:java.net.ConnectException: 拒绝连接

。。。

MetaException(message:Could not connect to meta storeusing any of the URIs provided. Most recent failure:org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接

尝试连接9083端口,netstat查看该端口确实没有被监听,第一反应是hiveserver没有正常启动。查看hiveserver进程却存在,只是监听10000端口。 

查看hive-site.xml配置,hive客户端连接9083端口,而hiveserver默认监听10000,找到问题根源了
解决办法:

Shell代码

hive --service hiveserver -p 9083

//或修改$HIVE_HOME/conf/hive-site.xml的hive.metastore.uris部分

//将端口改为10000

using /usr/lib/hive as HIVE_HOME

using /var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTOREas HIVE_CONF_DIR

using /usr/lib/hadoop as HADOOP_HOME

using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR

ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.

Wed Oct 22 18:48:53 CST 2014

JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera

using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME

using 5 as CDH_VERSION

using /usr/lib/hive as HIVE_HOME

using /var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTOREas HIVE_CONF_DIR

using /usr/lib/hadoop as HADOOP_HOME

using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR

ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.

Wed Oct 22 18:48:55 CST 2014

JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera

using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME

using 5 as CDH_VERSION

using /usr/lib/hive as HIVE_HOME

using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE as HIVE_CONF_DIR

using /usr/lib/hadoop as HADOOP_HOME

using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR

ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.

Wed Oct 22 18:48:58 CST 2014

JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera

using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME

using 5 as CDH_VERSION

using /usr/lib/hive as HIVE_HOME

using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE as HIVE_CONF_DIR

using /usr/lib/hadoop as HADOOP_HOME

using/var/run/cloudera-scm-agent/process/193-hive-HIVEMETASTORE/yarn-conf asHADOOP_CONF_DIR

ERROR: Failed to find hive-hbase storage handler jars toadd in hive-site.xml. Hive queries that use Hbase storage handler may not workuntil this is fixed.

JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
using /usr/java/jdk1.7.0_45-cloudera as JAVA_HOME
using 5 as CDH_VERSION
using /usr/lib/hive as HIVE_HOME
using /var/run/cloudera-scm-agent/process/212-hive-metastore-create-tables as HIVE_CONF_DIR
using /usr/lib/hadoop as HADOOP_HOME
using /var/run/cloudera-scm-agent/process/212-hive-metastore-create-tables/yarn-conf as HADOOP_CONF_DIR
ERROR: Failed to find hive-hbase storage handler jars to add in hive-site.xml. Hive queries that use Hbase storage handler may not work until this is fixed.

查看 /usr/lib/hive 是否正常

正常的

下午3点21:09.801 FATAL org.apache.hadoop.hbase.master.HMaster

Unhandled exception. Starting shutdown.

java.io.IOException: error or interruptedwhile splitting logs in[hdfs://master:8020/hbase/WALs/slave2,60020,1414202360923-splitting] Task =installed = 2 done = 1 error = 1

     atorg.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:362)

     atorg.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)

     atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)

     atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)

     atorg.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1070)

     atorg.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:854)

     atorg.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)

     atjava.lang.Thread.run(Thread.java:744)

下午3点46:12.903 FATAL org.apache.hadoop.hbase.master.HMaster

Unhandled exception. Starting shutdown.

java.io.IOException: error or interruptedwhile splitting logs in[hdfs://master:8020/hbase/WALs/slave2,60020,1414202360923-splitting] Task =installed = 1 done = 0 error = 1

     atorg.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:362)

     atorg.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)

     atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)

     atorg.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)

     atorg.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1070)

     atorg.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:854)

     atorg.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)

     atjava.lang.Thread.run(Thread.java:744)      

解决方法:
在hbase-site.xml加入一条,让启动hbase集群时不做hlog splitting


hbase.master.distributed.log.splitting
false

[root@master ~]# hadoop fs -mv/hbase/WALs/slave2,60020,1414202360923-splitting/ /test

[root@master ~]# hadoop fs -ls /test

2014-10-28 14:31:32,879 INFO[hconnection-0xd18e8a7-shared--pool2-t224] (AsyncProcess.java:673) - #3,table=session_service_201410210000_201410312359, attempt=14/35 failed 1383 ops,last exception: org.apache.hadoop.hbase.RegionTooBusyException:org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit,regionName=session_service_201410210000_201410312359,7499999991,1414203068872.08ee7bb71161cb24e18ddba4c14da0f2.,server=slave1,60020,1414380404290, memstoreSize=271430320,blockingMemStoreSize=268435456

   atorg.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2561)

   atorg.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:1963)

   at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4050)

   atorg.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3361)

   atorg.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3265)

   atorg.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26935)

   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)

   at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)

Exception

Description

ClockOutOfSyncException

当一个RegionServer始终偏移太大时,master节点结将会抛出此异常.

DoNotRetryIOException

用于提示不要再重试的异常子类: 如UnknownScannerException.

DroppedSnapshotException

如果在flush过程中快照内容并没有正确的存储到文件中时,该异常将被抛出.

HBaseIOException

所有hbase特定的IOExceptions都是HBaseIOException类的子类.

InvalidFamilyOperationException

Hbase接收修改表schema的请求,但请求中对应的列族名无效.

MasterNotRunningException

master节点没有运行的异常

NamespaceExistException

已存在某namespace的异常

NamespaceNotFoundException

找不到该namsespace的异常

NotAllMetaRegionsOnlineException

某操作需要所有root及meta节点同时在线,但实际情况不满足该操作要求

NotServingRegionException

向某RegionServer发送访问请求,但是它并没有反应或该region不可用.

PleaseHoldException

当某个ResionServer宕掉并由于重启过快而导致master来不及处理宕掉之前的server实例, 或者用户调用admin级操作时master正处于初始化状态时, 或者在正在启动的RegionServer上进行操作时都会抛出此类异常.

RegionException

访问region时出现的异常.

RegionTooBusyException

RegionServer处于繁忙状态并由于阻塞而等待提供服务的异常.

TableExistsException

已存在某表的异常

TableInfoMissingException

在table目录下无法找到.tableinfo文件的异常

TableNotDisabledException

某个表没有正常处于禁用状态的异常

TableNotEnabledException

某个表没有正常处于启用状态的异常

TableNotFoundException

无法找到某个表的异常

UnknownRegionException

访问无法识别的region引起的异常.

UnknownScannerException

向RegionServer传递了无法识别的scanner id的异常.

YouAreDeadException

当一个RegionServer报告它已被处理为dead状态,由master抛出此异常.

ZooKeeperConnectionException

客户端无法连接到zookeeper的异常.

INFO

org.apache.hadoop.hbase.regionserver.MemStoreFlusher

Waited 90779ms on a compaction to clean up 'too many store files'; waited long enough... proceeding with flush of session_service_201410210000_201410312359,7656249951,1414481868315.bbf0a49fb8a9b650a584769ddd1fdd89.

MemStoreFlusher实例生成时会启动MemStoreFlusher.FlushHandler线程实例,

此线程个数通过hbase.hstore.flusher.count配置,默认为1

一台机器硬盘满,一台机器硬盘不满的情况:

群集中有 26,632 个副本不足的块块。群集中共有 84,822 个块。百分比 副本不足的块: 31.40%。 警告阈值:10.00%。

群集中有 27,278 个副本不足的块块。群集中共有 85,476 个块。百分比 副本不足的块: 31.91%。 警告阈值:10.00%。

下午4点08:53.847

INFO

org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher

Flushed, sequenceid=45525, memsize=124.2 M, hasBloomFilter=true, into tmp file hdfs://master:8020/hbase/data/default/session_service_201410260000_201410312359/a3b64675b0069b8323665274e2f95cdc/.tmp/b7fa4f5f85354ecc96aa48a09081f786

下午4点08:53.862

INFO

org.apache.hadoop.hbase.regionserver.HStore

Added hdfs://master:8020/hbase/data/default/session_service_201410260000_201410312359/a3b64675b0069b8323665274e2f95cdc/f/b7fa4f5f85354ecc96aa48a09081f786, entries=194552, sequenceid=45525, filesize=47.4 M

下午4点09:00.378

WARN

org.apache.hadoop.ipc.RpcServer

(responseTooSlow): {"processingtimems":39279,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"192.168.5.9:41284","starttimems":1414656501099,"queuetimems":0,"class":"HRegionServer","responsesize":16,"method":"Scan"}

下午4点09:00.379

WARN

org.apache.hadoop.ipc.RpcServer

RpcServer.respondercallId: 33398 service: ClientService methodName: Scan size: 209 connection: 192.168.5.9:41284: output error

下午4点09:00.380

WARN

org.apache.hadoop.ipc.RpcServer

RpcServer.handler=79,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null

下午4点09:00.381

INFO

org.apache.hadoop.hbase.regionserver.HRegion

Finished memstore flush of ~128.1 M/134326016, currentsize=2.4 M/2559256 for region session_service_201410260000_201410312359,6406249959,1414571385831.a3b64675b0069b8323665274e2f95cdc. in 8133ms, sequenceid=45525, compaction requested=false

转载于:https://blog.51cto.com/ybs000/2121375

你可能感兴趣的:(大数据)