只叙述secondary namenode部署出错所产生的错误及解决方法
环境:suse 10.1
namenode 单独部署在cloud1
secondary namenode 单独部署在 cloud3
集群部署完成后使用Jps查看进程,发现该有的进程都有,hdfs也能上传下载文件
查看secondary name 上的log,发现在doCheckpoint都失败
2011-11-11 00:02:58,154 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint: 2011-11-11 00:02:58,155 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:211) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at java.net.Socket.connect(Socket.java:478) at sun.net.NetworkClient.doConnect(NetworkClient.java:163) at sun.net.www.http.HttpClient.openServer(HttpClient.java:395) at sun.net.www.http.HttpClient.openServer(HttpClient.java:530) at sun.net.www.http.HttpClient.<init>(HttpClient.java:234) at sun.net.www.http.HttpClient.New(HttpClient.java:307) at sun.net.www.http.HttpClient.New(HttpClient.java:324)
发现secondary namenode不能产生image文件夹更别说image内的fimage,edits等文件查看name node上的log
2011-11-11 23:13:03,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.0.1.162 2011-11-11 23:18:03,642 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.0.1.162没有错误,但是没有没有提示获取成功信息
关闭集群,在hdfs-site.xml中指定
<property> <name>dfs.http.address</name> <value>{your_namenode_ip}:50070</value> </property>
分发到下去,重启,发现image文件夹内文件都有了,再次产看secondar namenode上的日志仍然doCheckPoint失败
查看namenode上的log
2011-11-17 13:31:57,434 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:211)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:234)
at sun.net.www.http.HttpClient.New(HttpClient.java:307)
at sun.net.www.http.HttpClient.New(HttpClient.java:324)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
可以得知secondary namenode上传image失败,所以namenode也就getimage失败了
查了下发现时由于没有指定secondary 的地址,因此很多人在测试的时候由于namenode和secondary namenode是在同一台机子上
而默认的dfs.secondary.http.adress值是0.0.0.0:50090,所以两者能进行通信。但是当分开部署的时候没有指定这个值而默认去在本机上取数据上传数据
当然就出错了。这个我想很多人在部署的时候都没有太注意
<property> <name>dfs.secondary.http.address</name> <value>cloud3:50090</value> </property>将上述参数指定好,重启下hadoop,再次查看secondary namenode上的log,log变成了
2011-11-17 14:20:31,435 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 2011-11-17 14:20:31,443 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file fsimage size 6699 bytes. 2011-11-17 14:20:31,445 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file edits size 4 bytes. 2011-11-17 14:20:31,445 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 64-bit 2011-11-17 14:20:31,445 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 17.77875 MB 2011-11-17 14:20:31,445 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^21 = 2097152 entries 2011-11-17 14:20:31,445 INFO org.apache.hadoop.hdfs.util.GSet: recommended=2097152, actual=2097152 2011-11-17 14:20:31,447 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=ppstat 2011-11-17 14:20:31,447 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-11-17 14:20:31,447 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2011-11-17 14:20:31,447 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100 2011-11-17 14:20:31,448 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2011-11-17 14:20:31,448 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times 2011-11-17 14:20:31,448 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 53 2011-11-17 14:20:31,460 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 7 2011-11-17 14:20:31,463 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /data/cloud/hadoop/tmp/dfs/namesecondary/current/edits of size 4 edits # 0 loaded in 0 seconds. 2011-11-17 14:20:31,464 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 2011-11-17 14:20:31,470 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 6699 saved in 0 seconds. 2011-11-17 14:20:31,477 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 6699 saved in 0 seconds. 2011-11-17 14:20:31,480 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL cloud1:50070putimage=1&port=50090&machine=cloud3&token=-31:114395553:0:1321510831000:1321510231897 2011-11-17 14:20:31,491 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 6699说明我们获取image,并上传成功到namenode成功了
在查看namenode
2011-11-17 14:20:31,972 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage from 10.0.1.162 2011-11-17 14:20:31,972 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 1 2011-11-17 14:30:31,997 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.0.1.162 2011-11-17 14:30:31,997 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
说明就是由于没有指定secondary namenode地址所造成的。
ppstat@cloud3:/data/cloud/hadoop/tmp/dfs/namesecondary/current> ll 总计 20 -rw-r--r-- 1 ppstat users 4 2011-11-17 14:40 edits -rw-r--r-- 1 ppstat users 6699 2011-11-17 14:40 fsimage -rw-r--r-- 1 ppstat users 8 2011-11-17 14:40 fstime -rw-r--r-- 1 ppstat users 100 2011-11-17 14:40 VERSION
PS:抱怨下,csdn的blog越弄越垃圾了,编辑起来一点都不方便,坑爹的玩意