Hadoop群集环境搭建-- 伪分布式模式

Hadoop群集环境搭建

 伪分布式模式

介绍

操作系统:redhat5.6

NameNode:           172.16.40.180

DateNode:            172.16.40.201

DateNode:             172.16.40.108

software:

JDK version:  jdk1.6.0_27

Download  address:

http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u29-download-513648.html

Hadoop version: hadoop-0.20.2

Download  address:

http://download.filehat.com/apache/hadoop/core/hadoop-0.20.2/

JDK

[root@dlxa180 tools] # chmod +x jdk-6u27-linux-x64.bin

[root@dlxa180 tools] #./jdk-6u27-linux-x64.bin

[root@dlxa201 tools]#rm –rf  jdk-6u27-linux-x64.bin

[root@dlxa180 tools]#vi  /etc/profile

 

export JAVA_HOME=/home/tools/jdk

export JAVA_BIN=/home/tools/jdk/bin

export HADOOP_HOME=/usr/local/hadoop-0.20.3-dev

export ANT_HOME=/home/tools/apache-ant-1.8.2

PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$ANT_HOME/bin:$PATH

 

[root@dlxa180 tools] ## java -version

java version "1.6.0_27"

Java(TM) SE Runtime Environment (build 1.6.0_27-b07)

Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode)

说明:JDK安装参考,环境变量设置等

    

 

HOSTS and IPTABLES

 

[root@dlxa180 tools] #vi /etc/hosts

 

说明:将各个节点都需要追加上去

[root@dlxa180 tools] # service  iptables  stop

[root@dlxa180 tools] #chkconfig  iptables  off

SSH

 

[root@dlxa180 ~] #ssh-keygen -t rsa                                   

[root@dlxa180  .ssh]#cat id_rsa.pub >> authorized_keys   

      // 权限设置

[root@dlxa180 .ssh]#chmod 644 authorized_keys              

 //将 authorized_keys复制到其他远程机器上面 ,

 [root@dlxa180 .ssh]# scp authorized_keys 172.16.40.108:/root/.ssh       

说明:

namenode访问所有数据节点无密码,namenode访问自身无密码。

Hadoop

[root@dlxa180 tools] #tar –zxvf  hadoop-0.20.2.tar.gz

[root@dlxa180 tools] #rm –rf  hadoop-0.20.2.tar.gz

[root@dlxa180 tools] # vi  /etc/profile

export JAVA_HOME=/home/tools/jdk

export JAVA_BIN=/home/tools/jdk/bin

export HADOOP_HOME=/usr/local/hadoop-0.20.3-dev

export ANT_HOME=/home/tools/apache-ant-1.8.2

PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$ANT_HOME/bin:$PATH

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export JAVA_HOME JAVA_BIN ANT_HOME PATH CLASSPATH

[root@dlxa180 tools] # source /etc/profile

 

hadoop/conf

hadoop-env.sh

#追加

Export JAVA_HOME=/home/tools/jdk

Mastters

 

172.16.40.180

Slaves

 

172.16.40.108

172.16.40.201 

core-site.xml

 

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://172.16.40.180:54310</value>

<description>The name of the default file system.  A URI whose

  scheme and authority determine the FileSystem implementation.  The

  uri's scheme determines the config property (fs.SCHEME.impl) naming

  the FileSystem implementation class.  The uri's authority is used to

  determine the host, port, etc. for a filesystem.</description>

</property>

<property>

<name> dfs.replication</name>

<value>1</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/HadoopTemp</value>

</property>

</configuration>

hdfs-site.xml:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

 <description>Default block replication.

  The actual number of replications can be specified when the file is created.

  The default is used if replication is not specified in create time.

  </description>

</property>

</configuration>

Start hadoop

Master: 首先格式化在启动

[root@dlxa180 ~]#bin/hadoop namenode -format

[root@dlxa180 ~]#bin/start-all.sh

站在大象的背上说hello!

HDFS

显示文件

[root@dlxa180 hadoop-0.20.3-dev]#bin/hadoop dfs -ls

Found 8 items

drwxr-xr-x   - root supergroup          0 2011-11-09 13:55 /user/root/aa

drwxr-xr-x   - root supergroup          0 2011-11-09 14:02 /user/root/wbk

 

上传文件到HDFS

 

[root@dlxa180 hadoop-0.20.3-dev]# echo "hello my hadoop hdfs" > test1

[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -put test1 test

[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -ls test

Found 1 items

-rw-r--r--   1 root supergroup         21 2011-11-09 14:12 /user/root/test

 

HDFS上的文件复制到本地

[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -get test getin

[root@dlxa180 hadoop-0.20.3-dev]# cat getin

hello my hadoop hdfs

 

删除HDFS上的文件

 

 

查看HDFS下的某个文件

[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -cat aa/*

hello word

hello hadoop

管理与更新

#查看HDFS的基本信息

[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfsadmin –report

#离开安全模式

[root@dlxa180 hadoop-0.20.3-dev]#bin/hadoop dfsadmin -safemode leave 

#进入安全模式

[root@dlxa180 hadoop-0.20.3-dev]#bin/hadoop dfsadmin –safemode enter

 

添加节点

可以将namenode的hadoop目录复制到新的数据节点上,并修改master和slave

负载均衡:

[root@dlxa180 hadoop-0.20.3-dev]#bin/start-balancer.sh

starting balancer, logging to /usr/local/hadoop-0.20.3-dev/logs/hadoop-root-balancer-dlxa180.out

Time Stamp        Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved

The cluster is balanced. Exiting...

Balancing took 273.0 milliseconds

MapReduce

Wordcount

[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -put put in

[root@dlxa180 hadoop-0.20.3-dev]#bin/hadoop jar hadoop-0.20.2-examples.jar wordcount in out

11/11/09 14:40:54 INFO input.FileInputFormat: Total input paths to process : 2

11/11/09 14:40:55 INFO mapred.JobClient: Running job: job_201111091357_0006

11/11/09 14:40:56 INFO mapred.JobClient:  map 0% reduce 0%

11/11/09 14:41:04 INFO mapred.JobClient:  map 100% reduce 0%

11/11/09 14:41:16 INFO mapred.JobClient:  map 100% reduce 100%

11/11/09 14:41:18 INFO mapred.JobClient: Job complete: job_201111091357_0006

11/11/09 14:41:18 INFO mapred.JobClient: Counters: 18

11/11/09 14:41:18 INFO mapred.JobClient:   Job Counters

11/11/09 14:41:18 INFO mapred.JobClient:     Launched reduce tasks=1

11/11/09 14:41:18 INFO mapred.JobClient:     Rack-local map tasks=1

11/11/09 14:41:18 INFO mapred.JobClient:     Launched map tasks=2

11/11/09 14:41:18 INFO mapred.JobClient:     Data-local map tasks=1

11/11/09 14:41:18 INFO mapred.JobClient:   FileSystemCounters

11/11/09 14:41:18 INFO mapred.JobClient:     FILE_BYTES_READ=54

11/11/09 14:41:18 INFO mapred.JobClient:     HDFS_BYTES_READ=24

11/11/09 14:41:18 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=178

11/11/09 14:41:18 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=24

11/11/09 14:41:18 INFO mapred.JobClient:   Map-Reduce Framework

11/11/09 14:41:18 INFO mapred.JobClient:     Reduce input groups=3

11/11/09 14:41:18 INFO mapred.JobClient:     Combine output records=4

11/11/09 14:41:18 INFO mapred.JobClient:     Map input records=2

11/11/09 14:41:18 INFO mapred.JobClient:     Reduce shuffle bytes=60

11/11/09 14:41:18 INFO mapred.JobClient:     Reduce output records=3

11/11/09 14:41:18 INFO mapred.JobClient:     Spilled Records=8

11/11/09 14:41:18 INFO mapred.JobClient:     Map output bytes=40

11/11/09 14:41:18 INFO mapred.JobClient:     Combine input records=4

11/11/09 14:41:18 INFO mapred.JobClient:     Map output records=4

11/11/09 14:41:18 INFO mapred.JobClient:     Reduce input records=4

 

[root@dlxa180 hadoop-0.20.3-dev]# bin/hadoop dfs -get out output

[root@dlxa180 hadoop-0.20.3-dev]# cat output/*

cat: output/_logs: Is a directory

hadoop  1

hello   2

word    1

你可能感兴趣的:(hadoop)