centos6.5安装hadoop2.6.4

1、下载

http://hadoop.apache.org/releases.html

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

2、三台虚拟机

192.168.17.178
192.168.17.179
192.168.17.180

3、删除centos自带的jdk,安装jdk-7u80,三台机器都执行这一步操作

rpm -qa | grep java
rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
rpm -ivh jdk-7u80-linux-x64.rpm

安装完成后,配置jdk路径

vi /etc/profile
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$JAVA_HOME/bin
使修改生效
source /etc/profile   //使修改立即生效

4、设置178 ssh无密码登录179,180

178上执行命令

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
scp .ssh/id_dsa.pub [email protected]:/root
scp .ssh/id_dsa.pub [email protected]:/root

179、180执行命令

mkdir .ssh
cat id_dsa.pub >> .ssh/authorized_keys

5、上传hadoop2.6.4到178机器,并解压到/usr/local/hadoop

/usr/local/hadoop/etc/hadoop/hadoop-env.sh配置

# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest

# Assuming your installation directory is /usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop

/usr/local/hadoop/etc/hadoop/core-site.xml配置

<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://192.168.17.178:9000</value>
 </property>

  <property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop/tmp</value>
  </property>
 
  <property>
	<name>fs.checkpoint.period</name>
	<value>300</value>
	</property>
  <property>
	<name>fs.checkpoint.dir</name>
	<value>/usr/local/hadoop/dfs/namesecondary</value>
  </property>
</configuration>

/usr/local/hadoop/etc/hadoop/hdfs-site.xml配置

<configuration>
	<property>
		<name>dfs.http.address</name>
		<value>192.168.17.178:50070</value>
	</property>
	<property>
		<name>dfs.secondary.http.address</name>
		<value>192.168.17.178:50090</value>
    </property>
    <property>
		<name>dfs.namenode.name.dir</name>
		<value>/usr/local/hadoop/dfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/usr/local/hadoop/dfs/data</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
    <property>
        <name>dfs.nameservices</name>
        <value>192.168.17.178</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>192.168.17.178:50090</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>

/usr/local/hadoop/etc/hadoop/mapred-site.xml配置

<configuration>
     <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
         <final>true</final>
    </property>

    <property>
        <name>mapreduce.jobtracker.http.address</name>
        <value>192.168.17.178:50030</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>192.168.17.178:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>192.168.17.178:19888</value>
    </property>
        <property>
                <name>mapred.job.tracker</name>
                <value>192.168.17.178:9001</value>
        </property>
</configuration>

/usr/local/hadoop/etc/hadoop/yarm-site.xml配置

<configuration>
 <!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>192.168.17.178</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>192.168.17.178:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>192.168.17.178:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>192.168.17.178:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>192.168.17.178:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.17.178:8088</value>
    </property>

</configuration>

/usr/local/hadoop/etc/hadoop/master(注意master只是配置secondarynamenode节点运行,并不是配置主从)

192.168.17.178

/usr/local/hadoop/etc/hadoop/slaves

192.168.17.178
192.168.17.179
192.168.17.180

6、复制配置好的hadoop2.6.4到179,180机器

scp -r /usr/local/hadoop 192.168.17.179:/usr/local/hadoop
scp -r /usr/local/hadoop 192.168.17.180:/usr/local/hadoop

7、启动和关闭hadoop

bin/hdfs namenode -format
sbin/start-dfs.sh
sbin/stop-dfs.sh
sbin/start-yarn.sh
sbin/stop-yarn.sh
mr-jobhistory-daemon.sh start historyserver
mr-jobhistory-daemon.sh stop historyserver
sbin/hadoop-daemon.sh start secondarynamenode
sbin/hadoop-daemon.sh stop secondarynamenode

通过下边三个地址查看hadoop状态

http://192.168.17.178:8088
http://192.168.17.178:19888
http://192.168.17.178:50070

window下配置hadoop开发环境

hadoop解压在D:\hadoop\hadoop-2.6.4

1、安装 Hadoop-Eclipse-Plugin

下载插件https://github.com/winghc/hadoop2x-eclipse-plugin,把hadoop2x-eclipse-plugin-master\release\hadoop-eclipse-plugin-2.6.0.jar(本人使用Eclipse Luna(4.4.2))放到eclipse的plugins文件夹。

打开eclipse,window->preferences->Hadoop Map/Reduce配置hadoop路径D:\hadoop\hadoop-2.6.4

2、下载hadoop2.6版本的winutils和相关hadoop.dll文件,并放进D:\hadoop\hadoop-2.6.4\bin目录,解决java.io.IOException: Could not locate executable null \bin\winutils.exe in the Hadoop binaries.报错.

3、把hadoop.dll放进C:\Windows\System32目录或eclipse工程项目里面,解决java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z报错

如果是放进C:\Windows\System32,这里有个值得注意的问题,hadoop.dll版本必须与操作系统、jdk相同32位或64位才行,之前有台64位win7,32位jdk,32位eclipse机器一直报错就是这原因造成的。

而hadoop.dll放进eclipse工程里面(与src同一层目录),只需与jdk版本相同就行。

4、file->new->other->Map/Reduce projectu新建项目工程

运行例子:文章以文件形式保存在hdfs://192.168.17.178:9000/mongo目录,对所有文章按分词,并统计出现次数,分词使用ansj库。

import java.io.IOException;
import java.util.List;

import org.ansj.domain.Term;
import org.ansj.splitWord.analysis.ToAnalysis;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class NewsWordCount {
	public static class TokenizerMapper extends
			Mapper<Object, Text, Text, IntWritable> {

		private final static IntWritable one = new IntWritable(1);
		private Text word = new Text();

		public void map(Object key, Text value, Context context)
				throws IOException, InterruptedException {
			System.out.println("map value=" + value.toString());
			 List<Term> parse = ToAnalysis.parse(value.toString());
			 System.out.println(parse);
			 for(Term term : parse){
				 String natrue = term.getNatureStr();
				 String name = term.getName();
				 word.set(name + "_" + natrue);
				 context.write(word, one);
			 }
		}
	}

	public static class IntSumReducer extends
			Reducer<Text, IntWritable, Text, IntWritable> {
		private IntWritable result = new IntWritable();

		public void reduce(Text key, Iterable<IntWritable> values,
				Context context) throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable val : values) {
				sum += val.get();
			}
			result.set(sum);
			System.out.println("reduce text=" + key + "   result=" + result);
			context.write(key, result);
		}
	}

	public static void main(String[] args) throws Exception {
		System.setProperty("hadoop.home.dir", "D:/hadoop/hadoop-2.6.4");
		System.setProperty("HADOOP_USER_NAME", "root");
		args = new String[] { "hdfs://192.168.17.178:9000/mongo",
		"hdfs://192.168.17.178:9000/mongo_output" };
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf, "news word count");
		job.setJarByClass(NewsWordCount.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setCombinerClass(IntSumReducer.class);
		job.setReducerClass(IntSumReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

查看结果:

查看结果:
bin/hdfs dfs -cat /mongo_output/part-r-00000


参考文档

http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-common/SingleCluster.html

http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-common/ClusterSetup.html


你可能感兴趣的:(centos6.5安装hadoop2.6.4)