Hadoop伪分布式环境搭建
本次主要是对Hadoop Pseudo-Distributed环境搭建做介绍,以下操作都是在root用户下进行。
一、软件环境配置
1、 VM:VMware-workstationl-v7.1.4
2、 OS:ubuntu-11.04
3、 JDK:jdk1.6.0_27
4、 Hadoop:hadoop-0.20.2
5、 ssh
二、安装JDK
1、下载JDK:jdk-6u27-linux-i586.bin,并把它放到安装JDK的目录。
2、解压安装命令如下:
root@ubuntu:/usr/java# ./jdk-6u27-linux-i586.bin
3、配置环境变量
用如下命令打开/etc/profile文件:
root@ubuntu:/# vim /etc/profile
在文件最后添加内容如下:
export JAVA_HOME=/usr/java/jdk1.6.0_27 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
保存并退出文件,然后运行source命令使文件生效:
root@ubuntu:~# source /etc/profile
4、测试JDK
root@ubuntu:~# java -version java version "1.6.0_27" Java(TM) SE Runtime Environment (build 1.6.0_27-b07) Java HotSpot(TM) Client VM (build 20.2-b06, mixed mode, sharing)
三、安装配置ssh
1、检查是否安装了ssh
root@ubuntu:~# dpkg --list | grep ssh
默认一般只安装了openssh-client,没有安装openssh-server。可以通过下面命令安装server:
root@ubuntu:~# apt-get install openssh-server
2、检查ssh是否启动
# 杳看是否有sshd进程 root@ubuntu:~# ps -ef | grep ssh
如果未启动,可以通过下面命令启动:
root@ubuntu:~# /etc/init.d/ssh start
3、免密码配置ssh
root@ubuntu:~# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa ## -P表示密码,-P '' 就表示空密码,也可以不用-P参数,这样就要三车回车,用-P就一次回车。 Generating public/private dsa key pair. Your identification has been saved in .ssh/id_dsa. Your public key has been saved in .ssh/id_dsa.pub. 省略...
root@ubuntu:~# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
4、验证没有密码是否能ssh到localhost
root@ubuntu:~# ssh localhost
Welcome to Ubuntu 11.04 (GNU/Linux 2.6.38-8-generic i686) *Documentation: https://help.ubuntu.com/ 225 packages can be updated. 75 updates are security updates. Last login: Tue Sep 27 03:00:30 2011 from ip6-localhost
四、Hadoop安装配置
1、下载以前的稳定版:hadoop-0.20.2.tar.gz复制到准备安装的目录。
2、切换到安装目录,并解压。
3、配置
hadoop-env.sh:
取消JAVA_HOME注释并做如下修改:
export JAVA_HOME=/usr/java/jdk1.6.0_27
其它的可以根据需要做修改。
con/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-test1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-test1:9001</value>
</property>
</configuration>
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node should store </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/dfs/data</value>
<description>Determin. If this is a comma-delimited </description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replicied when the file is created. The default </description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
4、运行
格式化HDFS:
root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop namenode –format
启动hadoop守护进程:
root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/start-all.sh
通过浏览器查看hadoop运行状态:
NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/
复制本地文件到HDFS的input目录:
root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop fs –put conf input
运行hadoop提供的例子:
root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'dfs[a-z.]+'
查看DFS文件:
root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop fs -ls output
复制DFS文件到本地,并在本地查看:
root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop fs -get output output
root@ubuntu:/usr/hadoop/hadoop-0.20.2# cat output/*
或者直接查看DFS文件:
root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop fs -cat output/*
关闭hadoop守护进程:
root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/stop-all.sh
五、其它
Hadoop下载:
http://hadoop.apache.org/hdfs/releases.html
Hadoop详细开发指南请参考:
http://hadoop.apache.org/common/docs/stable/