1.使用eclipse开发mapreduce程序,发现是跑在本地(LocalRunnerJob),而不是集群。
解决方法:将程序打成jar包,然后使用hadoop命令行运行。打包用Fat jar这个工具将第三方jar包一起发布,不要勾选One-JAR.
错误:Exception in thread "main" java.lang.IllegalArgumentException: Unable to locate com.simontuffs.onejar.Boot in the java.class.path: consider using -Done-jar.jar. path to specify the one-jar filename.
2.FAILED Too many fetch-failures
解决方法:
1) 检查 /etc/hosts
要求本机ip 对应 服务 器名
要求要包含所有的服务器ip + 服务器名
/etc/hosts文件最前端如下信息:
127.0.0.1 localhost your_hostname
::1 localhost6 your_hostname
若将这两条信息注销掉,(或者把your_hostname删除掉)上述错误即可解决。
2) 检查 .ssh/authorized_keys
要求包含所有服务器(包括其自身)的public key
尽 管我们在安装hadoop之前已经配置了各节点的SSH无密码通信,假如有3个IP分别为 192.168.128.131 192.168.128.132 192.168.133 ,对应的主机名为 master 、 slave1 、 slave2 。从每个节点第一次执行命令$ ssh 主机名(master 、slave1 、 slave2) 的时候,会出现一行关于密钥的yes or no ?的提示信息,Enter确认后再次连接就正常了。如果我们没有手动做这一步,如果恰 好在hadoop/conf/core-site.xml 及 mpred-site.xml中相应的IP 用主机名代替了,则很可能出现该异常。
3.hadoop上任务reduce个数为1问题解决
Hadoop的参数会受客户端设置参数影响,我的任务在hadoop上运行时reduce个数总是1,查看hadoop安装路径下的conf文件夹中的配置文件,查看/conf/hadoop-site.xml 或者 /conf/hadoop-default.xml,查找:
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local".
</description>
</property>
需要对这个参数进行修改,修改为:
<property>
<name>mapred.reduce.tasks</name>
<value>11</value>
<description>The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local".
</description>
</property>
之后运行检查reduce个数,此时reduce个数为:11。修改成功。
参考:http://blog.chinaunix.net/uid-1838361-id-287231.html
4.org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException
5.HBASE SHELL 错误NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null
参考:http://blog.sina.com.cn/s/blog_718335510100zchp.html http://www.cnblogs.com/tangtianfly/archive/2012/04/11/2441760.html
6.ZooKeeper session expired
参考:http://jiajun.iteye.com/blog/1013215 http://www.kuqin.com/system-analysis/20110910/264590.html
7.org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase/.logs/Slave2,60020,1366353790042/Slave2%2C60020%2C1366353790042.1366353792650 File does not exist. [Lease. Holder: DFSClient_hb_rs_Slave2,60020,1366353790042, pendingcreates: 1]
修改 hadoop的配置文件 conf/hdfs-site.xml,添加
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
待确认!!!
8.Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface原因:客户端程序通过zookeeper访问hbase的连接数超过设置的默认链接数(默认数是30),连接数不够用会导致后续的连接连接不上去。
解决办法:设置hbase-site.xml配置文件,添加如下属性
<property>
<name>hbase.zookeeper.property.maxClientCnxns</name>
<value>300</value>
<description>Property from ZooKeeper's config zoo.cfg.
Limit on number of concurrent connections (at the socket level) that a
single client, identified by IP address, may make to a single member of
the ZooKeeper ensemble. Set high to avoid zk connection issues running
standalone and pseudo-distributed.
</description>
</property>
将最大连接数我这设置成了300,后来发现仍然提示同样的问题,最大连接数并没有起作用,根据属性提示,直接修改zoo.cfg配置文件
添加:maxClientCnxns=300
重启下zookeeper,hbase,重新测试,问题解决。
11.job failed:# of failed Reduce Tasks exceeded allowed limit. FailedCount:
参考:http://blog.163.com/zhengjiu_520/blog/static/3559830620130743644473/
12.FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting close of hlog
java.io.IOException
参考:http://blog.sina.com.cn/s/blog_53765cf90101auqo.html 待确认 http://www.codesky.net/article/201206/171897.html
13.Hbase Lease Exception
设置hbase.regionserver.lease.period和hbase.rpc.timeout hbase.rpc.timeout >=hbase.regionserver.lease.period
14.Task attempt_failed to report status for 600 seconds. Killing!
参考:http://stackoverflow.com/questions/5864589/how-to-fix-task-attempt-201104251139-0295-r-000006-0-failed-to-report-status-fo