Hadoop群集环境搭建 |
伪分布式模式 |
操作系统:redhat5.6
NameNode: 172.16.40.180
DateNode: 172.16.40.201
DateNode: 172.16.40.108
JDK version: jdk1.6.0_27
Download address:
http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u29-download-513648.html
Hadoop version: hadoop-0.20.2
Download address:
http://download.filehat.com/apache/hadoop/core/hadoop-0.20.2/
[root@dlxa180 tools] # chmod +x jdk-6u27-linux-x64.bin
[root@dlxa180 tools] #./jdk-6u27-linux-x64.bin
[root@dlxa201 tools]#rm –rf jdk-6u27-linux-x64.bin
[root@dlxa180 tools]#vi /etc/profile
export JAVA_HOME=/home/tools/jdk
export JAVA_BIN=/home/tools/jdk/bin
export HADOOP_HOME=/usr/local/hadoop-0.20.3-dev
export ANT_HOME=/home/tools/apache-ant-1.8.2
PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$ANT_HOME/bin:$PATH
[root@dlxa180 tools] ## java -version
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode)
说明:JDK安装参考,环境变量设置等
[root@dlxa180 tools] #vi /etc/hosts
说明:将各个节点都需要追加上去
[root@dlxa180 tools] # service iptables stop
[root@dlxa180 tools] #chkconfig iptables off
[root@dlxa180 ~] #ssh-keygen -t rsa
[root@dlxa180 .ssh]#cat id_rsa.pub >> authorized_keys
// 权限设置
[root@dlxa180 .ssh]#chmod 644 authorized_keys
//将 authorized_keys复制到其他远程机器上面 ,
[root@dlxa180 .ssh]# scp authorized_keys 172.16.40.108:/root/.ssh
说明:
namenode访问所有数据节点无密码,namenode访问自身无密码。
[root@dlxa180 tools] #tar –zxvf hadoop-0.20.2.tar.gz
[root@dlxa180 tools] #rm –rf hadoop-0.20.2.tar.gz
[root@dlxa180 tools] # vi /etc/profile
export JAVA_HOME=/home/tools/jdk
export JAVA_BIN=/home/tools/jdk/bin
export HADOOP_HOME=/usr/local/hadoop-0.20.3-dev
export ANT_HOME=/home/tools/apache-ant-1.8.2
PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$ANT_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME JAVA_BIN ANT_HOME PATH CLASSPATH
[root@dlxa180 tools] # source /etc/profile
hadoop-env.sh
#追加
Export JAVA_HOME=/home/tools/jdk
Mastters
172.16.40.180
Slaves:
172.16.40.108
172.16.40.201
core-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://172.16.40.180:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name> dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/HadoopTemp</value>
</property>
</configuration>
hdfs-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
Master: 首先格式化在启动
[root@dlxa180 ~]#bin/hadoop namenode -format
[root@dlxa180 ~]#bin/start-all.sh
显示文件
[root@dlxa180 hadoop-0.20.3-dev]#bin/hadoop dfs -ls
Found 8 items
drwxr-xr-x - root supergroup 0 2011-11-09 13:55 /user/root/aa
…
drwxr-xr-x - root supergroup 0 2011-11-09 14:02 /user/root/wbk
上传文件到HDFS
[root@dlxa180 hadoop-0.20.3-dev]# echo "hello my hadoop hdfs" > test1
[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -put test1 test
[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -ls test
Found 1 items
-rw-r--r-- 1 root supergroup 21 2011-11-09 14:12 /user/root/test
将HDFS上的文件复制到本地
[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -get test getin
[root@dlxa180 hadoop-0.20.3-dev]# cat getin
hello my hadoop hdfs
删除HDFS上的文件
查看HDFS下的某个文件
[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -cat aa/*
hello word
hello hadoop
#查看HDFS的基本信息
[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfsadmin –report
#离开安全模式
[root@dlxa180 hadoop-0.20.3-dev]#bin/hadoop dfsadmin -safemode leave
#进入安全模式
[root@dlxa180 hadoop-0.20.3-dev]#bin/hadoop dfsadmin –safemode enter
添加节点
可以将namenode的hadoop目录复制到新的数据节点上,并修改master和slave
[root@dlxa180 hadoop-0.20.3-dev]#bin/start-balancer.sh
starting balancer, logging to /usr/local/hadoop-0.20.3-dev/logs/hadoop-root-balancer-dlxa180.out
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
The cluster is balanced. Exiting...
Balancing took 273.0 milliseconds
Wordcount
[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -put put in
[root@dlxa180 hadoop-0.20.3-dev]#bin/hadoop jar hadoop-0.20.2-examples.jar wordcount in out
11/11/09 14:40:54 INFO input.FileInputFormat: Total input paths to process : 2
11/11/09 14:40:55 INFO mapred.JobClient: Running job: job_201111091357_0006
11/11/09 14:40:56 INFO mapred.JobClient: map 0% reduce 0%
11/11/09 14:41:04 INFO mapred.JobClient: map 100% reduce 0%
11/11/09 14:41:16 INFO mapred.JobClient: map 100% reduce 100%
11/11/09 14:41:18 INFO mapred.JobClient: Job complete: job_201111091357_0006
11/11/09 14:41:18 INFO mapred.JobClient: Counters: 18
11/11/09 14:41:18 INFO mapred.JobClient: Job Counters
11/11/09 14:41:18 INFO mapred.JobClient: Launched reduce tasks=1
11/11/09 14:41:18 INFO mapred.JobClient: Rack-local map tasks=1
11/11/09 14:41:18 INFO mapred.JobClient: Launched map tasks=2
11/11/09 14:41:18 INFO mapred.JobClient: Data-local map tasks=1
11/11/09 14:41:18 INFO mapred.JobClient: FileSystemCounters
11/11/09 14:41:18 INFO mapred.JobClient: FILE_BYTES_READ=54
11/11/09 14:41:18 INFO mapred.JobClient: HDFS_BYTES_READ=24
11/11/09 14:41:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=178
11/11/09 14:41:18 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=24
11/11/09 14:41:18 INFO mapred.JobClient: Map-Reduce Framework
11/11/09 14:41:18 INFO mapred.JobClient: Reduce input groups=3
11/11/09 14:41:18 INFO mapred.JobClient: Combine output records=4
11/11/09 14:41:18 INFO mapred.JobClient: Map input records=2
11/11/09 14:41:18 INFO mapred.JobClient: Reduce shuffle bytes=60
11/11/09 14:41:18 INFO mapred.JobClient: Reduce output records=3
11/11/09 14:41:18 INFO mapred.JobClient: Spilled Records=8
11/11/09 14:41:18 INFO mapred.JobClient: Map output bytes=40
11/11/09 14:41:18 INFO mapred.JobClient: Combine input records=4
11/11/09 14:41:18 INFO mapred.JobClient: Map output records=4
11/11/09 14:41:18 INFO mapred.JobClient: Reduce input records=4
[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -get out output
[root@dlxa180 hadoop-0.20.3-dev]# cat output/*
cat: output/_logs: Is a directory
hadoop 1
hello 2
word 1