hadoop 安装调试案例 wordcount,小错误与解决办法
本实例的前提是,已经有INPUT目录和file01.txt file02.txt并且已经提交到hdfs的in 目录里。
$mkdir input
$echo "hello world">file01.txt
$echo "hello world">file02.txt
$bin/hadoop dfs -put input in
由于上次运行成功。忘记删除或者清空out目录,导致出现一系列错误。其中有一条:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
网络上有人建议修改hadoop-env.sh里的export HADOOP_HEAPSIZE=2000 数值大些。实验了一下,没有起作用。
曾经怀疑上网客户端影响,也证明不是这个原因。
网上还有人考虑到是hadoop版本的问题。差不多是这个原因。
解决办法,一下两个步骤很关键。
1.下载了一个新的文件hadoop-core-1.1.2。下载地址:http://download.csdn.net/download/xunianchong/5349797
2.删除hdfs系统目录里的out目录。
操作过程:
Administrator@dong ~
//启动服务
$ net start sshd
CYGWIN sshd 服务正在启动 .
CYGWIN sshd 服务已经启动成功。
//登录
Administrator@dong ~
$ ssh localhost
Last login: Wed May 22 21:20:37 2013 from localhost
//切换到命令行bin
Administrator@dong ~
$ cd /cygdrive/e/Hadoop/run/
//删除上次生成的out文件,否则就会出错误。莫名其妙的。
Administrator@dong /cygdrive/e/Hadoop/run
$ bin/hadoop dfs -rmr out
//没有打开hadoop服务,所以下面出现链接服务器失败信息。无法删除out目录
cygwin warning:
MS-DOS style path detected: E:\Hadoop\run\/build/native
Preferred POSIX equivalent is: /cygdrive/e/Hadoop/run/build/native
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
13/05/22 21:33:18 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
13/05/22 21:33:20 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
13/05/22 21:33:22 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
13/05/22 21:33:24 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
13/05/22 21:33:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
13/05/22 21:33:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
13/05/22 21:33:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
13/05/22 21:33:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
13/05/22 21:33:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
13/05/22 21:33:35 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:8888. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixed Sleep(maxRetries=10, sleepTime=1 SECONDS)
Bad connection to FS. command aborted. exception: Call to localhost/127.0.0.1:88 88 failed on connection exception: java.net.ConnectException: Connection refused : no further information
//下面的命令是错误的。
Administrator@dong /cygdrive/e/Hadoop/run
$ hadoop/start-all.sh
-bash: hadoop/start-all.sh: No such file or directory
//下面的命令是正确的
Administrator@dong /cygdrive/e/Hadoop/run
$ bin/start-all.sh
starting namenode, logging to /cygdrive/e/Hadoop/run/libexec/../logs/hadoop-Admi nistrator-namenode-dong.out
localhost: starting datanode, logging to /cygdrive/e/Hadoop/run/libexec/../logs/ hadoop-Administrator-datanode-dong.out
localhost: starting secondarynamenode, logging to /cygdrive/e/Hadoop/run/libexec /../logs/hadoop-Administrator-secondarynamenode-dong.out
starting jobtracker, logging to /cygdrive/e/Hadoop/run/libexec/../logs/hadoop-Ad ministrator-jobtracker-dong.out
localhost: starting tasktracker, logging to /cygdrive/e/Hadoop/run/libexec/../lo gs/hadoop-Administrator-tasktracker-dong.out
//删除以前的out目录,这次也不成功,因为在安全模式。
Administrator@dong /cygdrive/e/Hadoop/run
$ bin/hadoop dfs -rmr out
rmr: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /us er/SYSTEM/out. Name node is in safe mode.
//退出安全模式,
Administrator@dong /cygdrive/e/Hadoop/run
$ bin/hadoop dfsadmin -safemode leave
Safe mode is OFF
//终于删除成功了
Administrator@dong /cygdrive/e/Hadoop/run
$ bin/hadoop dfs -rmr out
Deleted hdfs://localhost:8888/user/SYSTEM/out
//终于可以运行wordcount了
Administrator@dong /cygdrive/e/Hadoop/run
$ bin/hadoop jar hadoop-examples-1.1.2.jar wordcount in out
13/05/22 21:36:18 WARN util.NativeCodeLoader: Unable to load native-hadoop libra ry for your platform... using builtin-java classes where applicable
13/05/22 21:36:18 INFO input.FileInputFormat: Total input paths to process : 2
13/05/22 21:36:18 WARN snappy.LoadSnappy: Snappy native library not loaded
13/05/22 21:36:18 INFO mapred.JobClient: Running job: job_local_0001
13/05/22 21:36:18 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/05/22 21:36:18 INFO mapred.MapTask: io.sort.mb = 100
13/05/22 21:36:18 INFO mapred.MapTask: data buffer = 79691776/99614720
13/05/22 21:36:18 INFO mapred.MapTask: record buffer = 262144/327680
13/05/22 21:36:18 INFO mapred.MapTask: Starting flush of map output
13/05/22 21:36:18 INFO mapred.MapTask: Finished spill 0
13/05/22 21:36:18 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/05/22 21:36:18 INFO mapred.LocalJobRunner:
13/05/22 21:36:18 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/05/22 21:36:18 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/05/22 21:36:18 INFO mapred.MapTask: io.sort.mb = 100
13/05/22 21:36:18 INFO mapred.MapTask: data buffer = 79691776/99614720
13/05/22 21:36:18 INFO mapred.MapTask: record buffer = 262144/327680
13/05/22 21:36:18 INFO mapred.MapTask: Starting flush of map output
13/05/22 21:36:18 INFO mapred.MapTask: Finished spill 0
13/05/22 21:36:18 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
13/05/22 21:36:18 INFO mapred.LocalJobRunner:
13/05/22 21:36:18 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
13/05/22 21:36:18 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/05/22 21:36:18 INFO mapred.LocalJobRunner:
13/05/22 21:36:18 INFO mapred.Merger: Merging 2 sorted segments
13/05/22 21:36:18 INFO mapred.Merger: Down to the last merge-pass, with 2 segmen ts left of total size: 64 bytes
13/05/22 21:36:18 INFO mapred.LocalJobRunner:
13/05/22 21:36:18 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
13/05/22 21:36:18 INFO mapred.LocalJobRunner:
13/05/22 21:36:18 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowe d to commit now
13/05/22 21:36:18 INFO output.FileOutputCommitter: Saved output of task 'attempt _local_0001_r_000000_0' to out
13/05/22 21:36:18 INFO mapred.LocalJobRunner: reduce > reduce
13/05/22 21:36:18 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/05/22 21:36:19 INFO mapred.JobClient: map 100% reduce 100%
13/05/22 21:36:19 INFO mapred.JobClient: Job complete: job_local_0001
13/05/22 21:36:19 INFO mapred.JobClient: Counters: 19
13/05/22 21:36:19 INFO mapred.JobClient: File Output Format Counters
13/05/22 21:36:19 INFO mapred.JobClient: Bytes Written=32
13/05/22 21:36:19 INFO mapred.JobClient: FileSystemCounters
13/05/22 21:36:19 INFO mapred.JobClient: FILE_BYTES_READ=428750
13/05/22 21:36:19 INFO mapred.JobClient: HDFS_BYTES_READ=90
13/05/22 21:36:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=622801
13/05/22 21:36:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=32
13/05/22 21:36:19 INFO mapred.JobClient: File Input Format Counters
13/05/22 21:36:19 INFO mapred.JobClient: Bytes Read=36
13/05/22 21:36:19 INFO mapred.JobClient: Map-Reduce Framework
13/05/22 21:36:19 INFO mapred.JobClient: Reduce input groups=4
13/05/22 21:36:19 INFO mapred.JobClient: Map output materialized bytes=72
13/05/22 21:36:19 INFO mapred.JobClient: Combine output records=5
13/05/22 21:36:19 INFO mapred.JobClient: Map input records=2
13/05/22 21:36:19 INFO mapred.JobClient: Reduce shuffle bytes=0
13/05/22 21:36:19 INFO mapred.JobClient: Reduce output records=4
13/05/22 21:36:19 INFO mapred.JobClient: Spilled Records=10
13/05/22 21:36:19 INFO mapred.JobClient: Map output bytes=60
13/05/22 21:36:19 INFO mapred.JobClient: Total committed heap usage (bytes)= 437465088
13/05/22 21:36:19 INFO mapred.JobClient: Combine input records=6
13/05/22 21:36:19 INFO mapred.JobClient: Map output records=6
13/05/22 21:36:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=222
13/05/22 21:36:19 INFO mapred.JobClient: Reduce input records=5
Administrator@dong /cygdrive/e/Hadoop/run
$ bin/hadoop dfs -cat out/*
dong 1
hadoop 1
hello 3
world 1
Administrator@dong /cygdrive/e/Hadoop/run
$ bin/hadoop jar hadoop-examples-1.1.2.jar wordcount in out
13/05/22 21:55:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/05/22 21:55:54 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-SYSTEM/mapred/staging/SYSTEM-1458473732/.staging/job_local_0001
13/05/22 21:55:54 ERROR security.UserGroupInformation: PriviledgedActionException as:SYSTEM cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory out already exists
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory out already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:949)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Administrator@dong /cygdrive/e/Hadoop/run
$