Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记

记录一下这两天 Hadoop 搭建成功的经验,分享给大家以便让小伙伴们少走弯路。

1. Hadoop 伪分布式单机搭建

1.1 环境准备

新建一台VMWare 虚拟机

操作系统:RedHat EL 6.2   64bit

网络:NAT模式  

配置IP:192.168.182.140

主机名:hadoop1

1.2 下载安装

(1)JDK1.8

下载之前,使用命令 java -version 判断是否系统自带了。

我的自带的是比较旧的1.6,用命令查看和卸载:

#查看jdk安装命令
>rpm -qa|grep jdk
xxx-openjdk-yyyy

#卸载jdk命令
>rpm -e --nodeps xxx-openjdk-yyyy

#再次查看是否卸载成功
>java -version

(2)hadoop-2.8.0.tar.gz

下载地址:http://hadoop.apache.org/releases.html 

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第1张图片


1.3  Hadoop 部署步骤

参考文章:http://blog.csdn.net/uq_jin/article/details/51451995

注意,我这里稍微有点不同就是环境变量:

#环境变量设置
>vi /etc/profile
....省略....
JAVA_HOME=/software/jdk1.8.0_131
JRE_HOME=$JAVA_HOME/jre
HADOOP_HOME=/software/hadoop-2.8.0
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
CLASS_PATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME:$HADOOP_HOME/etc/hadoop:$HADOOP_HOME/share/hadoop/common/lib/*:/$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/hdfs:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/contrib/capacity-scheduler/*.jar
export PATH JAVA_HOME CLASS_PATH JRE_HOME HADOOP_HOME

最后部署过程截图:

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第2张图片

sbin/start-yarn.sh

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第3张图片

sbin/mr-jobhistory-daemon.sh start historyserver 日志跟踪

DFS-OVERVIEW

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第4张图片

Cluster application OverView

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第5张图片


1.4 跑MapReduce测试

前提准备:

新建/henry/input/目录命令:

>hadoop fs -mkdir -p /henry/input/


运行wordcount命令:

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-xxxxx.jar wordcount /henry/input/ /henry/output/wordcount

运行结果:

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第6张图片Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第7张图片

注意几点:

(1)/input 是根目录,示例中直接写input,实际指向的是/user/root/input,注意前面的slash斜杠“/”

(2)/output目录,必须是不存在的!

(3)确保虚拟机能满足运行的基本硬件要求!

2. eclipse 安装hadoop插件

2.1 插件下载

下载链接:http://download.csdn.net/download/darkdragonking/9849522

(1)亲测下载可用,我的eclipse版本是Luna 4..4.2。

(2)把下载好的jar放到 eclipse/plugins/ 目录下

(3)最后 重启eclipse 就好了!

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第8张图片

2.2 插件配置

参考:http://www.linuxidc.com/Linux/2015-08/120943.htm


3. MRUnit 单元测试

3.1 MRUnit jar下载

搜索下载时注意,一定要下载hadoop2.x版本的,不然会报兼容错误。

报错信息:

java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskInputOutputContext, but class was expected
	at org.apache.hadoop.mrunit.internal.mapreduce.AbstractMockContextWrapper.createCommon(AbstractMockContextWrapper.java:59)
	at org.apache.hadoop.mrunit.internal.mapreduce.MockMapContextWrapper.create(MockMapContextWrapper.java:77)
	at org.apache.hadoop.mrunit.internal.mapreduce.MockMapContextWrapper.(MockMapContextWrapper.java:68)
	at org.apache.hadoop.mrunit.mapreduce.MapDriver.getContextWrapper(MapDriver.java:167)
	at org.apache.hadoop.mrunit.mapreduce.MapDriver.run(MapDriver.java:144)
	at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:640)
	at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:627)
	at com.demo.sort.MaxTemperatureMapperTest.processValidRecord(MaxTemperatureMapperTest.java:35)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)


我是从这里下载的:http://download.csdn.net/download/fkbush/9522361#comment

3.2 写UnitTest代码

单元测试函数的上面加上 @Test 注入标签。示例代码如下:

MaxTemperatureMapperTest.java

package com.demo.sort;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Test;

public final class MaxTemperatureMapperTest {

	@Test
	public void processValidRecord() throws IOException {
		final Text value = new Text("123456798676231190101234567986762311901012345679867623119010123456798676231190101234561+00121534567890356"
		/*
		 * +
		 * "\r\n123456798676231190101234567986762311901012345679867623119010123456798676231190101234562+01122934567890456"
		 * +
		 * "\r\n123456798676231190201234567986762311901012345679867623119010123456798676231190101234562+02120234567893456"
		 * +
		 * "\r\n123456798676231190401234567986762311901012345679867623119010123456798676231190101234561+00321234567803456"
		 * +
		 * "\r\n123456798676231190101234567986762311902012345679867623119010123456798676231190101234561+00429234567903456"
		 * +
		 * "\r\n123456798676231190501234567986762311902012345679867623119010123456798676231190101234561+01021134568903456"
		 * +
		 * "\r\n123456798676231190201234567986762311902012345679867623119010123456798676231190101234561+01124234578903456"
		 * +
		 * "\r\n123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+04121234678903456"
		 * +
		 * "\r\n123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+0082123567"
		 */);
		new MapDriver().withMapper(new MaxTemperatureMapper()).withInput(new LongWritable(110000), value).withOutput(new Text("1901"), new IntWritable(11))
				.runTest();
	}
}

MaxTemperatureMapper.java

/**
* 页面描述
* @author Henry
* Created at 2017年7月23日
*/
package com.demo.sort;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

// 获得每年的温度值。
// 输入:一串文本。
// 输出:年份(字符范围:15-19)和温度(字符范围:87-2位,有可能有加号),当没有温度数据时,用-999代替。
public class MaxTemperatureMapper extends Mapper {
	int MISSED = 999;
	@Override
	protected void map(final LongWritable key, final Text value, final Mapper.Context context) throws IOException, InterruptedException {
		final String line = value.toString();
		final String year = line.substring(15, 19);
		final char symbol = line.charAt(87);
		int airTemplate = -999;
		if (symbol == '+') {
			airTemplate = Integer.parseInt(line.substring(88, 92));
		} else {
			airTemplate = Integer.parseInt(line.substring(87, 92));
		}
		
		final String quality = line.substring(92, 93); // 空气质量指数
		if (airTemplate == MISSED && quality.matches("[01459]")) {
			context.write(new Text(year), new IntWritable(airTemplate));
		}
	}
}

3.3 运行结果

运行报错了!!可能是发布jar的作者缺少发布一个类了?有知道的朋友么?

java.lang.NoClassDefFoundError: org/powermock/api/mockito/PowerMockito
	at org.apache.hadoop.mrunit.mapreduce.MapDriver.run(MapDriver.java:147)
	at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:640)
	at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:627)
	at com.demo.sort.MaxTemperatureMapperTest.processValidRecord(MaxTemperatureMapperTest.java:35)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: java.lang.ClassNotFoundException: org.powermock.api.mockito.PowerMockito
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 27 more

3.4 报错解决过程后续

为了避免繁琐的jar依赖关系,我决定采用Maven插件来管理jar。

项目结构如下:

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第9张图片


pom.xml:


	4.0.0

	practice.hadoop
	simple-examples
	0.0.1-SNAPSHOT
	jar

	simple-examples
	http://maven.apache.org

	
		UTF-8
	

	
		
			junit
			junit
			4.12
			test
		
		
		
			org.apache.hadoop
			hadoop-common
			2.8.0
		

		
			org.apache.hadoop
			hadoop-hdfs
			2.8.0
		
		
		
			org.apache.hadoop
			hadoop-client
			2.8.0
		

		
			org.apache.mrunit
			mrunit
			1.1.0
			hadoop2
			test
		
		
		
			org.apache.hadoop
			hadoop-mapreduce-client-core
			2.8.0
		
		
		
			org.apache.hadoop
			hadoop-yarn-api
			2.8.0
		
		
		
			org.apache.hadoop
			hadoop-auth
			2.8.0
		

		
			jdk.tools
			jdk.tools
			1.8
			system
			${JAVA_HOME}/lib/tools.jar
		
		
		
			org.apache.hadoop
			hadoop-minicluster
			2.8.0
			test
		
		
		
			org.apache.hadoop
			hadoop-mapreduce-client-jobclient
			2.8.0
			provided
		

	


运行结果:

晕了,果然又有新的报错,这回是找不到io.Text类,可是jar里明明有这个,项目编译也没报错~

java.lang.NoClassDefFoundError: org/apache/hadoop/io/Text
	at practice.hadoop.simple_examples.MaxTemperatureMapperTest.processValidRecord(MaxTemperatureMapperTest.java:15)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 24 more


注: 以上问题已经解决了,谢谢支持!
    解决办法:
     1. 清除 /src/test/下面所有文件;
     2. 清除 /target/目录下所有文件;
     3. 下载maven binaries包,解压到某个目录,配置好环境变量;
     4. cd 到项目目录
     5. 执行命令,mvn clean
     6. 执行命令:mvn assembly:assembly
     7. 编译成功,并且在target目录下生成了withdependency.jar文件;
     8. 将jar拷贝到hadoop server某个目录下;
     9. 执行:
        >export HADOOP_CLASSPATH=jar文件所在的目录/withdependency.jar
        >hadoop your.package.MainClass /henry/input/weather.txt /henry/output/weather


3.5 结果截图

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第10张图片

温度列表

Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记_第11张图片



你可能感兴趣的:(大数据时代)