1、下载
http://hadoop.apache.org/releases.html
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
2、三台虚拟机
192.168.17.178 192.168.17.179 192.168.17.180
3、删除centos自带的jdk,安装jdk-7u80,三台机器都执行这一步操作
rpm -qa | grep java rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64 rpm -ivh jdk-7u80-linux-x64.rpm
安装完成后,配置jdk路径
vi /etc/profile export JAVA_HOME=/usr/java/latest export PATH=$PATH:$JAVA_HOME/bin 使修改生效 source /etc/profile //使修改立即生效
4、设置178 ssh无密码登录179,180
178上执行命令
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys scp .ssh/id_dsa.pub [email protected]:/root scp .ssh/id_dsa.pub [email protected]:/root
179、180执行命令
mkdir .ssh cat id_dsa.pub >> .ssh/authorized_keys
5、上传hadoop2.6.4到178机器,并解压到/usr/local/hadoop
/usr/local/hadoop/etc/hadoop/hadoop-env.sh配置
# set to the root of your Java installation export JAVA_HOME=/usr/java/latest # Assuming your installation directory is /usr/local/hadoop export HADOOP_PREFIX=/usr/local/hadoop
/usr/local/hadoop/etc/hadoop/core-site.xml配置
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.17.178:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> <property> <name>fs.checkpoint.period</name> <value>300</value> </property> <property> <name>fs.checkpoint.dir</name> <value>/usr/local/hadoop/dfs/namesecondary</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/hdfs-site.xml配置
<configuration> <property> <name>dfs.http.address</name> <value>192.168.17.178:50070</value> </property> <property> <name>dfs.secondary.http.address</name> <value>192.168.17.178:50090</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.nameservices</name> <value>192.168.17.178</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>192.168.17.178:50090</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/mapred-site.xml配置
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>192.168.17.178:50030</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>192.168.17.178:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>192.168.17.178:19888</value> </property> <property> <name>mapred.job.tracker</name> <value>192.168.17.178:9001</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/yarm-site.xml配置
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.17.178</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>192.168.17.178:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>192.168.17.178:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>192.168.17.178:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>192.168.17.178:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>192.168.17.178:8088</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/master(注意master只是配置secondarynamenode节点运行,并不是配置主从)
192.168.17.178
/usr/local/hadoop/etc/hadoop/slaves
192.168.17.178 192.168.17.179 192.168.17.180
6、复制配置好的hadoop2.6.4到179,180机器
scp -r /usr/local/hadoop 192.168.17.179:/usr/local/hadoop scp -r /usr/local/hadoop 192.168.17.180:/usr/local/hadoop
7、启动和关闭hadoop
bin/hdfs namenode -format sbin/start-dfs.sh sbin/stop-dfs.sh sbin/start-yarn.sh sbin/stop-yarn.sh mr-jobhistory-daemon.sh start historyserver mr-jobhistory-daemon.sh stop historyserver sbin/hadoop-daemon.sh start secondarynamenode sbin/hadoop-daemon.sh stop secondarynamenode
通过下边三个地址查看hadoop状态
http://192.168.17.178:8088 http://192.168.17.178:19888 http://192.168.17.178:50070
window下配置hadoop开发环境
hadoop解压在D:\hadoop\hadoop-2.6.4
1、安装 Hadoop-Eclipse-Plugin
下载插件https://github.com/winghc/hadoop2x-eclipse-plugin,把hadoop2x-eclipse-plugin-master\release\hadoop-eclipse-plugin-2.6.0.jar(本人使用Eclipse Luna(4.4.2))放到eclipse的plugins文件夹。
打开eclipse,window->preferences->Hadoop Map/Reduce配置hadoop路径D:\hadoop\hadoop-2.6.4
2、下载hadoop2.6版本的winutils和相关hadoop.dll文件,并放进D:\hadoop\hadoop-2.6.4\bin目录,解决java.io.IOException: Could not locate executable null \bin\winutils.exe in the Hadoop binaries.报错.
3、把hadoop.dll放进C:\Windows\System32目录或eclipse工程项目里面,解决java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z报错
如果是放进C:\Windows\System32,这里有个值得注意的问题,hadoop.dll版本必须与操作系统、jdk相同32位或64位才行,之前有台64位win7,32位jdk,32位eclipse机器一直报错就是这原因造成的。
而hadoop.dll放进eclipse工程里面(与src同一层目录),只需与jdk版本相同就行。
4、file->new->other->Map/Reduce projectu新建项目工程
运行例子:文章以文件形式保存在hdfs://192.168.17.178:9000/mongo目录,对所有文章按分词,并统计出现次数,分词使用ansj库。
import java.io.IOException; import java.util.List; import org.ansj.domain.Term; import org.ansj.splitWord.analysis.ToAnalysis; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class NewsWordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { System.out.println("map value=" + value.toString()); List<Term> parse = ToAnalysis.parse(value.toString()); System.out.println(parse); for(Term term : parse){ String natrue = term.getNatureStr(); String name = term.getName(); word.set(name + "_" + natrue); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); System.out.println("reduce text=" + key + " result=" + result); context.write(key, result); } } public static void main(String[] args) throws Exception { System.setProperty("hadoop.home.dir", "D:/hadoop/hadoop-2.6.4"); System.setProperty("HADOOP_USER_NAME", "root"); args = new String[] { "hdfs://192.168.17.178:9000/mongo", "hdfs://192.168.17.178:9000/mongo_output" }; Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "news word count"); job.setJarByClass(NewsWordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
查看结果:
查看结果: bin/hdfs dfs -cat /mongo_output/part-r-00000
参考文档
http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-common/SingleCluster.html
http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-common/ClusterSetup.html