一、步骤:
1、 安装jdk(略)
2、 安装eclipse(略)
如有问题请参考http://blog.csdn.net/crazytaliban/article/details/68958000
3、 在windows7中安装hadoop-2.7.3。
只需将hadoop-2.7.3.tar.gz解压即可,我解压到d:\hadoop-2.7.3。
4、 安装eclipse的hadoop插件。
从网上下载hadoop-eclipse-plugin-2.7.3.jar(或自己编译), 将其拷贝到D:\eclipse\plugins目录下即可。
5、 启动eclipse,点开Windows->preferences,弹出如下对话框,设置hadoop的安装目录。注意:是windows7系统下的目录。即步骤3解压的目录。设置好后点击OK。
6、点开Windows->ShowView->Other…,弹出如下对话框。
在其中选中Map/ReduceLocations,点击OK后将成功添加Map/ReduceLocations窗口,如下图:
7、点击右侧的小象图标创建New Hadoop Location…,如下图:
弹出如下对话框:
设置Hadoop集群的主节点,此处我添加的是主节点的名称“Master”,如此设置需要在C:\Windows\System32\drivers\etc文件下的host文件中添加一行信息:
192.168.1.200 Master,如下图。 或者直接在Host文本框中输入IP地址。
DFS Master中的Port信息一定要与core-site.xml文件中的配置信息一致。
设置好后点击Finsh,Map/ReduceLocations中多出一条记录。
9、新建工程
点开File->New->Project,弹出如下对话框:输入工程名称,如果Windows7上的Hadoop位置有变化可以点击“ConfigureHadoop install directory”进行重新配置。
然后点击Next,弹出如下对话框:打开新添加的类WordCount.java,在其中添加如下代码:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public classWordCount {
publicstaticclassTokenizerMapperextends Mapper
private final static IntWritable one = newIntWritable(1);
private Textword =new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr =new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word,one);
}
}
}
publicstaticclassIntSumReducerextends Reducer
private IntWritableresult =new IntWritable();
public void reduce(Text key, Iterable
throws IOException, InterruptedException {
intsum = 0;
for (IntWritableval :values) {
sum += val.get();
}
result.set(sum);
context.write(key,result);
}
}
publicstaticvoidmain(String[]args)throwsException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if (otherArgs.length!= 2) {
System.err.println("Usage: wordcount
System.exit(2);
}
Job job = newJob(conf,"word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job,new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job,new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
10、编译运行。
点开Run->Run Configrations,弹出如下对话框:
点击“JavaApplication”,然后单击左上角的添加按钮,弹出如下对话框:
在对话框中修改Name,然后点击“Search”按钮添加“Main class”,然后点击“Apply”按钮。然后在右侧Arguments选项卡中输入如下内容:
hdfs://Master:9000/input/wc.txt
hdfs://Master:9000/output1
注意:确保hdfs上存在/input/wc.txt文件,并且不存在/output1文件夹。
如下图:然后点击“Run”即可运行。
11、点击“Run”之后可能存在的错误。
错误一:(不影响代码编译执行)log4j:WARN No appenders could be foundfor logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4jsystem properly.
log4j:WARN Seehttp://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
错误原因:
无法生成日志。
解决办法:
在创建的Java(mapreduce)工程的src文件下新建文件log4j.properties,文件内容如下:
# Configure logging for testing:optionally with log file
#log4j.rootLogger=debug,appender
log4j.rootLogger=info,appender
#log4j.rootLogger=error,appender
#\u8F93\u51FA\u5230\u63A7\u5236\u53F0
log4j.appender.appender=org.apache.log4j.ConsoleAppender
#\u6837\u5F0F\u4E3ATTCCLayout
log4j.appender.appender.layout=org.apache.log4j.TTCCLayout
Exception in thread "main" java.net.ConnectException:Call From cyril-PC/192.168.1.106 to Master:8020 failed on connection exception:java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
atsun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
atsun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
atjava.lang.reflect.Constructor.newInstance(Unknown Source)
atorg.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
atorg.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
atorg.apache.hadoop.ipc.Client.call(Client.java:1479)
atorg.apache.hadoop.ipc.Client.call(Client.java:1412)
atorg.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
atcom.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
atorg.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
atjava.lang.reflect.Method.invoke(Unknown Source)
atorg.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
atorg.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
atcom.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
atorg.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
atorg.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
atorg.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
atorg.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
atorg.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
atorg.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
atorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145)
atorg.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
atorg.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
atorg.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
atorg.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
atjava.security.AccessController.doPrivileged(Native Method)
atjavax.security.auth.Subject.doAs(Unknown Source)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
atorg.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
atorg.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
atWordCount.main(WordCount.java:61)
Caused by: java.net.ConnectException: Connectionrefused: no further information
atsun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
atsun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
atorg.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
atorg.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
atorg.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
atorg.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
atorg.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
atorg.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
atorg.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
atorg.apache.hadoop.ipc.Client.call(Client.java:1451)
...28 more
可能原因:
1、文件系统端口不是默认的8020,需要查看core-site.xml文件
2、hadoop集群没有启动
解决办法:
1、确保hadoop集群启动正确;
2、在编译执行时,在输入参数中添加端口值,否则默认为8020,如:
hdfs://Master:9000/input/test.txt
hdfs://Master:9000/output1
错误三:
[main] ERRORorg.apache.hadoop.util.Shell - Failed to locate the winutils binary in thehadoop binary path
java.io.IOException: Could not locate executable D:\hadoop-2.7.3\bin\winutils.exe in theHadoop binaries.
atorg.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
atorg.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
atorg.apache.hadoop.util.Shell.
atorg.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:440)
atorg.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:486)
atorg.apache.hadoop.util.GenericOptionsParser.
atorg.apache.hadoop.util.GenericOptionsParser.
atWordCount.main(WordCount.java:47)
[main] WARNorg.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop libraryfor your platform... using builtin-java classes where applicable
[main] INFOorg.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated.Instead, use dfs.metrics.session-id
[main] INFOorg.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics withprocessName=JobTracker, sessionId=
Exception in thread "main" java.io.IOException:(null) entry in command string: null chmod 0700D:\tmp\hadoop-cyril\mapred\staging\Cyril2058634212\.staging
atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:769)
atorg.apache.hadoop.util.Shell.execCommand(Shell.java:866)
atorg.apache.hadoop.util.Shell.execCommand(Shell.java:849)
atorg.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
atorg.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:491)
atorg.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
atorg.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:509)
atorg.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:305)
atorg.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133)
atorg.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:144)
atorg.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
atorg.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
atjava.security.AccessController.doPrivileged(Native Method)
atjavax.security.auth.Subject.doAs(Unknown Source)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
atorg.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
atorg.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
atWordCount.main(WordCount.java:61)
可能原因:
ERROR org.apache.hadoop.util.Shell - Failed to locate thewinutils binary in the hadoop binary path,这条日志信息中明确说明了是因为缺少winutils.exe
解决办法:
只需把winutils.exe拷贝到%HADOOP_HOME%/bin目录下即可。winutils.exe文件可从网上下载。然后重新编译执行Java代码,会出现错误三。
错误四:
[main] WARN org.apache.hadoop.util.NativeCodeLoader- Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable
[main] INFOorg.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated.Instead, use dfs.metrics.session-id
[main] INFOorg.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics withprocessName=JobTracker, sessionId=
[main] WARNorg.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job orJob#setJar(String).
[main] INFOorg.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths toprocess : 1
[main] INFOorg.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
[main] INFOorg.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job:job_local1072922152_0001
[main] INFOorg.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging areafile:/tmp/hadoop-Cyril/mapred/staging/Cyril1072922152/.staging/job_local1072922152_0001
Exception in thread "main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
atorg.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
atorg.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:609)
atorg.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
atorg.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187)
atorg.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
atorg.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
atorg.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
atorg.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
atorg.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
atorg.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:125)
atorg.apache.hadoop.mapred.LocalJobRunner$Job.
atorg.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
atorg.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
atorg.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
atorg.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
atjava.security.AccessController.doPrivileged(Native Method)
atjavax.security.auth.Subject.doAs(Unknown Source)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
atorg.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
atorg.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
atWordCount.main(WordCount.java:61)
可能原因:
1、[main] WARNorg.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop libraryfor your platform... using builtin-java classes where applicable,由这条日志信息表明缺少hadoop动态库(Hadoop.dll)
解决办法:
在网上下载hadoop.dll,将其拷贝到c:\windows\System32\文件夹下即可。
12、程序打包
右键单击工程,点开“Export…”在弹出的对话框中选择“JAR file”,如下图:单击“Finish”。
将生成的“WordCount.jar”文件上传到集群的“/srv/ftp”目录下,然后通过以下命令执行程序:
hadoopjar /srv/ftp/WordCount.jar /input /output