CentOS环境下安装Hadoop集群

一、配置环境变量

1.Java环境

sudo vi /etc/profile

#set java Environment
export JAVA_HOME=/usr/local/jdk
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/bin:$PATH"
source /etc/profile

2.域名主机名配置(三台以上电脑)

sudo vi /etc/hosts

127.0.0.1 localhost
172.16.10.213 master
172.16.10.214 slave1
172.16.10.215 slave2

sudo vi /etc/sysconfig/network

HOSTNAME=master

注意(上面为213电脑,其他两台电脑依次修改)

二、配置SSH无密码登录(SSH安装省略)

1. 一直回车生成密钥

[dev@master ~]$ ssh-keygen -t rsa 

2. 复制公钥

[dev@slave1 ~]$ cd/home/hadoop/.ssh/ 
[dev@slave1 .ssh]$ cat id_rsa.pub >> authorized_keys 
[dev@slave1 .ssh]$ chmod 600 authorized_keys 

3.登录两台创建.ssh目录 ,复制公钥

[dev@slave1 ~]$ mkdir /home/hadoop/.ssh  #
[dev@slave2 ~]$ mkdir /home/hadoop/.ssh
[dev@master .ssh]$ scp id_rsa.pub dev@slave1:/home/dev/.ssh/ 
[dev@master .ssh]$ scp id_rsa.pub dev@slave2:/home/dev/.ssh/ 


4.开启RSA认证 

vi /etc/ssh/sshd_config 

RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile      .ssh/authorized_keys

[root@slave1-hadoop ~]# service sshd restart

5.测试

[dev@master ~]$ ssh slave1 
[dev@master ~]$ ssh slave2

三、Hadoop安装

1、下载、解压(略)

2、解决JAVA_HOME找不到问题(请在需要调用的地方之前加入)

vi libexec/hadoop-config.sh

export JAVA_HOME=(/usr/local/jdk)

3、配置文件:

core-site.xml

	<configuration>
    		<property>
        		<name>fs.defaultFS</name>
       			 <value>hdfs://master:9000</value>
   		 </property>
	</configuration>

hdfs-site.xml 

	<configuration>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/home/dev/hadoop/data/hadoop-nn/master</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>
			file:/home/dev/hadoop/data/hadoop-dn/slave1,
			file:/home/dev/hadoop/data/hadoop-dn/slave2
		</value>
	</property>
	<property>  
		 <name>dfs.replication</name>  
	  	 <value>2</value>  
	</property>
	</configuration>
注: hdfs-site.xml 配置中, namenode配置, 后面部分可以去掉,datanode配置,前面可以去掉

4.加入datanode节点

 vi etc/hadoop/slaves

slave1
slave2

四、启动Hadoop

1.namenode存储目录需要格式化,datanode存储目录不需要格式化,启动时自动创建

./bin/hdfs namenode -format

./sbin/start-all.sh

然后运行jps即可看到全部运行进程

2.查看当前节点

./bin/hdfs dfsadmin -report

Datanodes available: 3 (3 total, 0 dead) // 3为当前节点数

五、测试运行

1.导入文件
./bin/hadoop fs -mkdir /data 
./bin/hadoop fs -put -f example/file1.txt example/file2.txt /data
2.运行 WordCount(java) 版本
./bin/hadoop jar ./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.3.0-sources.jar org.apache.hadoop.examples.WordCount /data /output
3.查看结果
./bin/hadoop fs -ls /output/
./bin/hadoop fs -cat /output/part-r-00000 


六、其他常见命令

export HADOOP_ROOT_LOGGER=INFO,console //配置日志输出方式,便于调试错误 
./bin/hdfs dfsadmin -report //可以查看到现在集群上连接的节点
./bin/hdfs dfsadmin -refreshNodes //这样会强制重新加载配置
./bin/hdfs namenode -format //格式化
./bin/hadoop fs -ls /
./sbin/hadoop-daemon.sh --script hdfs start namenode
./sbin/hadoop-daemon.sh --script hdfs start datanode
./sbin/start-dfs.sh
./sbin/stop-dfs.sh



参考:

http://www.linuxidc.com/Linux/2014-03/97565.htm

http://blog.csdn.net/a775700879/article/details/20692259

http://www.tuicool.com/articles/Njya6f


你可能感兴趣的:(CentOS环境下安装Hadoop集群)