环境:Vmware 8.0 和ubuntu11.04
Ubuntu下搭建Hadoop开发环境
第一步:首先安装jdk和hadoop
1.1 下载jdk1.7
注意:一定要下linux 下32位的jdk1.7,不要下64位的jdk1.7
http://download.oracle.com/otn-pub/java/jdk/7u7-b10/jdk-7u7-linux-i586.tar.gz?AuthParam=1350391248_23ed968a088cf58dc9c6ddb735cce206
1.2 下载hadoop-0.20.2
http://labs.mop.com/apache-mirror/hadoop/common/hadoop-0.22.0/hadoop-0.22.0.tar.gz
1.3 解压到/home/tanglg1987
第二步:设置环境变量
su root vim /etc/profile
用vim编辑器打开/etc目录下的profile,在文件末尾增加如下几行 :
export JAVA_HOME=/home/tanglg1987/jdk1.7.0_07 export HADOOP_HOME=/home/tanglg1987/hadoop-0.20.2 export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH export CLASSPATH=$JAVA_HOME/lib
source一下:
source /etc/profile
第三步:测试
java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode)
hadoop version
Hadoop 0.20.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
第四步:配置伪分布模式
由于只有一台机器,所以只能配置伪分布模式了,即hadoop守护进程运行在本地机器上,模拟一个小规模的集群。
hadoop-env.sh的配置
export JAVA_HOME=/home/tanglg1987/jdk1.7.0_07
core-site.xml 的配置
<property> <name>fs.default.name</name> <value>hdfs://localhost:9100</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
hdfs-site.xml的配置
<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
mapred-site.xml的配置
<property> <name>mapred.job.tracker</name> <value>localhost:9101</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>
第五步:配置SSH
1.1安装ssh
sudo apt-get install ssh
1.2 基于空命令创建一个新的ssh密钥,以启用无密码登录。
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /home/xiaoming/.ssh/id_rsa.
Your public key has been saved in /home/xiaoming/.ssh/id_rsa.pub.
The key fingerprint is:
19:41:d5:4a:97:04:7f:a8:3d:ee:fc:20:07:9f:33:47 xiaoming@ustc
The key's randomart image is:
+--[ RSA 2048]----+
| .o.o+.. |
| ...+. |
| .. oo . |
| o.o . |
| S o o E |
| + + |
| . O . |
| = = |
| o.. |
+-----------------+
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
1.3 测试
ssh localhost
Welcome to Ubuntu 11.04 (GNU/Linux 2.6.38-13-generic i686)
* Documentation: https://help.ubuntu.com/
Last login: Fri Apr 27 17:54:39 2012 from localhost
第六步: 在/home/tanglg1987目录下新建一个start.sh脚本文件,每次启动虚拟机都要删除/tmp目录下的全部文件,重新格式化namenode,代码如下:
sudo rm -rf /tmp/* rm -rf /home/tanglg1987/hadoop-0.20.2/logs hadoop namenode -format hadoop datanode -format start-all.sh hadoop dfsadmin -safemode leave ./start.sh
执行过程如下:
12/10/15 23:05:38 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = tanglg1987/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
12/10/15 23:05:39 INFO namenode.FSNamesystem: fsOwner=tanglg1987,tanglg1987,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
12/10/15 23:05:39 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/15 23:05:39 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/15 23:05:39 INFO common.Storage: Image file of size 100 saved in 0 seconds.
12/10/15 23:05:39 INFO common.Storage: Storage directory /tmp/hadoop-tanglg1987/dfs/name has been successfully formatted.
12/10/15 23:05:39 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at tanglg1987/127.0.1.1
************************************************************/
12/10/15 23:05:40 INFO datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = tanglg1987/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
Usage: java DataNode
[-rollback]
12/10/15 23:05:40 INFO datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at tanglg1987/127.0.1.1
************************************************************/
starting namenode, logging to /home/tanglg1987/hadoop-0.20.2/bin/../logs/hadoop-tanglg1987-namenode-tanglg1987.out
localhost: starting datanode, logging to /home/tanglg1987/hadoop-0.20.2/bin/../logs/hadoop-tanglg1987-datanode-tanglg1987.out
localhost: starting secondarynamenode, logging to /home/tanglg1987/hadoop-0.20.2/bin/../logs/hadoop-tanglg1987-secondarynamenode-tanglg1987.out
starting jobtracker, logging to /home/tanglg1987/hadoop-0.20.2/bin/../logs/hadoop-tanglg1987-jobtracker-tanglg1987.out
localhost: starting tasktracker, logging to /home/tanglg1987/hadoop-0.20.2/bin/../logs/hadoop-tanglg1987-tasktracker-tanglg1987.out
Safe mode is OFF
第七步:查看运行结果
1.查看日志文件:/home/tanglg1987/hadoop-0.20.2/logs
2.查看report
hadoop dfsadmin -report
Configured Capacity: 20079898624 (18.7 GB)
Present Capacity: 11551305743 (10.76 GB)
DFS Remaining: 11551281152 (10.76 GB)
DFS Used: 24591 (24.01 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 20079898624 (18.7 GB)
DFS Used: 24591 (24.01 KB)
Non DFS Used: 8528592881 (7.94 GB)
DFS Remaining: 11551281152(10.76 GB)
DFS Used%: 0%
DFS Remaining%: 57.53%
Last contact: Mon Oct 15 23:07:59 CST 2012
3.查看web服务端口
http://localhost50060/
http://localhost:50070/