记录一下这两天 Hadoop 搭建成功的经验,分享给大家以便让小伙伴们少走弯路。
新建一台VMWare 虚拟机
操作系统:RedHat EL 6.2 64bit
网络:NAT模式
配置IP:192.168.182.140
主机名:hadoop1
(1)JDK1.8
下载之前,使用命令 java -version 判断是否系统自带了。
我的自带的是比较旧的1.6,用命令查看和卸载:
#查看jdk安装命令
>rpm -qa|grep jdk
xxx-openjdk-yyyy
#卸载jdk命令
>rpm -e --nodeps xxx-openjdk-yyyy
#再次查看是否卸载成功
>java -version
下载地址:http://hadoop.apache.org/releases.html
参考文章:http://blog.csdn.net/uq_jin/article/details/51451995
注意,我这里稍微有点不同就是环境变量:
#环境变量设置
>vi /etc/profile
....省略....
JAVA_HOME=/software/jdk1.8.0_131
JRE_HOME=$JAVA_HOME/jre
HADOOP_HOME=/software/hadoop-2.8.0
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
CLASS_PATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME:$HADOOP_HOME/etc/hadoop:$HADOOP_HOME/share/hadoop/common/lib/*:/$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/hdfs:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/contrib/capacity-scheduler/*.jar
export PATH JAVA_HOME CLASS_PATH JRE_HOME HADOOP_HOME
最后部署过程截图:
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver 日志跟踪
DFS-OVERVIEW
Cluster application OverView
前提准备:
新建/henry/input/目录命令:
>hadoop fs -mkdir -p /henry/input/
运行wordcount命令:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-xxxxx.jar wordcount /henry/input/ /henry/output/wordcount
运行结果:
注意几点:
(1)/input 是根目录,示例中直接写input,实际指向的是/user/root/input,注意前面的slash斜杠“/”
(2)/output目录,必须是不存在的!
(3)确保虚拟机能满足运行的基本硬件要求!
下载链接:http://download.csdn.net/download/darkdragonking/9849522
(1)亲测下载可用,我的eclipse版本是Luna 4..4.2。
(2)把下载好的jar放到 eclipse/plugins/ 目录下
(3)最后 重启eclipse 就好了!
参考:http://www.linuxidc.com/Linux/2015-08/120943.htm
搜索下载时注意,一定要下载hadoop2.x版本的,不然会报兼容错误。
报错信息:
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskInputOutputContext, but class was expected
at org.apache.hadoop.mrunit.internal.mapreduce.AbstractMockContextWrapper.createCommon(AbstractMockContextWrapper.java:59)
at org.apache.hadoop.mrunit.internal.mapreduce.MockMapContextWrapper.create(MockMapContextWrapper.java:77)
at org.apache.hadoop.mrunit.internal.mapreduce.MockMapContextWrapper.(MockMapContextWrapper.java:68)
at org.apache.hadoop.mrunit.mapreduce.MapDriver.getContextWrapper(MapDriver.java:167)
at org.apache.hadoop.mrunit.mapreduce.MapDriver.run(MapDriver.java:144)
at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:640)
at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:627)
at com.demo.sort.MaxTemperatureMapperTest.processValidRecord(MaxTemperatureMapperTest.java:35)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
我是从这里下载的:http://download.csdn.net/download/fkbush/9522361#comment
单元测试函数的上面加上 @Test 注入标签。示例代码如下:
MaxTemperatureMapperTest.java
package com.demo.sort;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Test;
public final class MaxTemperatureMapperTest {
@Test
public void processValidRecord() throws IOException {
final Text value = new Text("123456798676231190101234567986762311901012345679867623119010123456798676231190101234561+00121534567890356"
/*
* +
* "\r\n123456798676231190101234567986762311901012345679867623119010123456798676231190101234562+01122934567890456"
* +
* "\r\n123456798676231190201234567986762311901012345679867623119010123456798676231190101234562+02120234567893456"
* +
* "\r\n123456798676231190401234567986762311901012345679867623119010123456798676231190101234561+00321234567803456"
* +
* "\r\n123456798676231190101234567986762311902012345679867623119010123456798676231190101234561+00429234567903456"
* +
* "\r\n123456798676231190501234567986762311902012345679867623119010123456798676231190101234561+01021134568903456"
* +
* "\r\n123456798676231190201234567986762311902012345679867623119010123456798676231190101234561+01124234578903456"
* +
* "\r\n123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+04121234678903456"
* +
* "\r\n123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+0082123567"
*/);
new MapDriver().withMapper(new MaxTemperatureMapper()).withInput(new LongWritable(110000), value).withOutput(new Text("1901"), new IntWritable(11))
.runTest();
}
}
MaxTemperatureMapper.java
/**
* 页面描述
* @author Henry
* Created at 2017年7月23日
*/
package com.demo.sort;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
// 获得每年的温度值。
// 输入:一串文本。
// 输出:年份(字符范围:15-19)和温度(字符范围:87-2位,有可能有加号),当没有温度数据时,用-999代替。
public class MaxTemperatureMapper extends Mapper {
int MISSED = 999;
@Override
protected void map(final LongWritable key, final Text value, final Mapper.Context context) throws IOException, InterruptedException {
final String line = value.toString();
final String year = line.substring(15, 19);
final char symbol = line.charAt(87);
int airTemplate = -999;
if (symbol == '+') {
airTemplate = Integer.parseInt(line.substring(88, 92));
} else {
airTemplate = Integer.parseInt(line.substring(87, 92));
}
final String quality = line.substring(92, 93); // 空气质量指数
if (airTemplate == MISSED && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemplate));
}
}
}
运行报错了!!可能是发布jar的作者缺少发布一个类了?有知道的朋友么?
java.lang.NoClassDefFoundError: org/powermock/api/mockito/PowerMockito
at org.apache.hadoop.mrunit.mapreduce.MapDriver.run(MapDriver.java:147)
at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:640)
at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:627)
at com.demo.sort.MaxTemperatureMapperTest.processValidRecord(MaxTemperatureMapperTest.java:35)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: java.lang.ClassNotFoundException: org.powermock.api.mockito.PowerMockito
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 27 more
为了避免繁琐的jar依赖关系,我决定采用Maven插件来管理jar。
项目结构如下:
pom.xml:
4.0.0
practice.hadoop
simple-examples
0.0.1-SNAPSHOT
jar
simple-examples
http://maven.apache.org
UTF-8
junit
junit
4.12
test
org.apache.hadoop
hadoop-common
2.8.0
org.apache.hadoop
hadoop-hdfs
2.8.0
org.apache.hadoop
hadoop-client
2.8.0
org.apache.mrunit
mrunit
1.1.0
hadoop2
test
org.apache.hadoop
hadoop-mapreduce-client-core
2.8.0
org.apache.hadoop
hadoop-yarn-api
2.8.0
org.apache.hadoop
hadoop-auth
2.8.0
jdk.tools
jdk.tools
1.8
system
${JAVA_HOME}/lib/tools.jar
org.apache.hadoop
hadoop-minicluster
2.8.0
test
org.apache.hadoop
hadoop-mapreduce-client-jobclient
2.8.0
provided
运行结果:
晕了,果然又有新的报错,这回是找不到io.Text类,可是jar里明明有这个,项目编译也没报错~
java.lang.NoClassDefFoundError: org/apache/hadoop/io/Text
at practice.hadoop.simple_examples.MaxTemperatureMapperTest.processValidRecord(MaxTemperatureMapperTest.java:15)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 24 more
注: 以上问题已经解决了,谢谢支持!
解决办法:
1. 清除 /src/test/下面所有文件;
2. 清除 /target/目录下所有文件;
3. 下载maven binaries包,解压到某个目录,配置好环境变量;
4. cd 到项目目录
5. 执行命令,mvn clean
6. 执行命令:mvn assembly:assembly
7. 编译成功,并且在target目录下生成了withdependency.jar文件;
8. 将jar拷贝到hadoop server某个目录下;
9. 执行:
>export HADOOP_CLASSPATH=jar文件所在的目录/withdependency.jar
>hadoop your.package.MainClass /henry/input/weather.txt /henry/output/weather
温度列表