Hadoop MapReduce wordcount 实例简单介绍

本篇介绍MapReduce wordcount简单实例,在此之前请搭建好hadoop ha高可用环境和myeclipse上hadoop api环境配置,如果没有请参考hadoop ha 高可用搭建和hadoop hdfs的api简单使用。

目录

一、总体架构

二、配置hadoop环境

三、wordcount实例编写

一、总体架构

总体结构如下表所示,即hadoop ha 之上添加了RS(Resource Manager)和NM(Node Manager)。

Hadoop MapReduce wordcount 实例简单介绍_第1张图片

 

 

二、配置hadoop环境

虽然node01不需要添加RS或NM,但在此采取的策略是在node01上配置好传输到另外三个节点。

重命名mapred-site.xml.template为mapred-site.xml

cp /myapp/hadoop-3.1.2/etc/hadoop/mapred-site.xml.template /myapp/hadoop-3.1.2/etc/hadoop/mapred-site.xml

配置mapred-site.xml,全部内容如下


    
        mapreduce.framework.name
        yarn
    
    
      yarn.app.mapreduce.am.env
      HADOOP_MAPRED_HOME=${HADOOP_HOME}
    
    
      mapreduce.map.env
      HADOOP_MAPRED_HOME=${HADOOP_HOME}
    
    
      mapreduce.reduce.env
      HADOOP_MAPRED_HOME=${HADOOP_HOME}
    

配置yarn-site.xml,configure标签中全部内容如下

 
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    

   yarn.resourcemanager.ha.enabled
   true
 
 
   yarn.resourcemanager.cluster-id
   cluster1
 
 
   yarn.resourcemanager.ha.rm-ids
   rm1,rm2
 
 
   yarn.resourcemanager.hostname.rm1
   node03
 
 
   yarn.resourcemanager.hostname.rm2
   node04
 
 
   yarn.resourcemanager.zk-address
   node02:2181,node03:2181,node04:2181
 

将两个配置文件分发

scp mapred-site.xml yarn-site.xml node02:`pwd`
scp mapred-site.xml yarn-site.xml node03:`pwd`
scp mapred-site.xml yarn-site.xml node04:`pwd`

在node02、node03、node04上‘zkServer.sh start’启动zookeeper;在node01上‘start-dfs.sh’启动集群

启动yarn(node01中)

start-yarn.sh

启动resourcemanager(node03、node04中)

yarn-daemon.sh start resourcemanager

测试

在windows浏览器中输入“node03:8088”可以看到节点状态。

关闭

1、关闭resourcemanager(node03、node04中)

yarn-daemon.sh stop resourcemanager

2、关闭yarn(node01中)

stop-yarn.sh

3、关闭集群(node01中)

stop-dfs.sh

4、关闭zookeeper(node02、node03、node04)

zkServer.sh stop

三、wordcount实例编写

创建如下三个文件

Hadoop MapReduce wordcount 实例简单介绍_第2张图片

其中MyWC.java如下

package com.dxw.hadoop.wordcount;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MyWC {
	
	public static void main(String[] args) throws Exception {
		
		Configuration conf = new Configuration(true);
		
		Job job = Job.getInstance(conf);
		
		// Create a new Job
	     //Job job = Job.getInstance();
	     job.setJarByClass(MyWC.class);
	     
	     // Specify various job-specific parameters     
	     job.setJobName("myjob");
	     
//	     job.setInputPath(new Path("in"));
//	     job.setOutputPath(new Path("out"));
	     
	     Path input = new Path("/user/root/test.txt");
	     FileInputFormat.addInputPath(job, input );
	     
	     Path output = new Path("/data/wc/output");
	     if(output.getFileSystem(conf).exists(output)){
	    	 output.getFileSystem(conf).delete(output,true);
	     }
	     FileOutputFormat.setOutputPath(job, output );
	     
	     job.setMapperClass(MyMapper.class);
	     job.setMapOutputKeyClass(Text.class);
	     job.setMapOutputValueClass(IntWritable.class);
	     
	     job.setReducerClass(MyReducer.class);

	     // Submit the job, then poll for progress until the job is complete
	     job.waitForCompletion(true);
		
	}

}

MyMapper.java如下

package com.dxw.hadoop.wordcount;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MyMapper extends Mapper{
	
	private final static IntWritable one = new IntWritable(1);
	   private Text word = new Text();
	   
	   public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
	     StringTokenizer itr = new StringTokenizer(value.toString());
	     while (itr.hasMoreTokens()) {
	       word.set(itr.nextToken());
	       context.write(word, one);
	     }
	   }

}

MyReducer.java如下

package com.dxw.hadoop.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyReducer extends Reducer{

	 private IntWritable result = new IntWritable();
	 
	   public void reduce(Text key, Iterable values,
	                      Context context) throws IOException, InterruptedException {
	     int sum = 0;
	     for (IntWritable val : values) {
	       sum += val.get();
	     }
	     result.set(sum);
	     context.write(key, result);
	   }
	
}

将编写好的java代码导出为jar文件

Hadoop MapReduce wordcount 实例简单介绍_第3张图片

上传到node01上

执行如下命令统计字数

hadoop jar MyWC.jar com.dxw.hadoop.wordcount.MyWC

执行下面命令可以看到统计后的文件

hdfs dfs -ls /data/wc/output

执行下面命令,从hdfs中下载到本地

hdfs dfs -get /data/wc/output/* ./

查看统计结果

vi part-r-00000

 

你可能感兴趣的:(大数据)