[置顶] hadoop单机部署

废话少说,以下是我的安装过程。

工具如下

Pc是学校的台式机,32位E4600CPU,1G内存,自己的笔记本

虚拟机centos5.8 32位、jdk-6u13-linux-i586.bin、hadoop-0.22.0.tar.gz、SecureCrt(远程登陆PC)、百度(外网只能上csdn、cnblog,其余是百度快照)

闲着无聊,想把学校的电脑搭成hadoop,电脑太烂,登陆太慢,只用通过自己电脑ssh连接。方法参看http://blog.csdn.net/xluren/article/details/8197183

1.创建用户和组

groupadd hadoop//添加组

useradd hadoop -d /home/hadoop -g hadoop//添加用户和组,设置家目录

passwd hadoop//设置密码

2.注销登录hadoop

这一步是配置ssh信任关系,具体的命令如下:

ssh-keygen -t rsa  然后一直敲回车直到出现命令提示符

[fy@localhost ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/fy/.ssh/id_rsa): 
/home/fy/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/fy/.ssh/id_rsa.
Your public key has been saved in /home/fy/.ssh/id_rsa.pub.
The key fingerprint is:
2d:b2:0d:4c:21:aa:72:6d:42:6a:92:e8:4d:a5:be:71 fy@localhost
[fy@localhost ~]$ 
具体的情形如上图所示,这步基本不会出错,如果没有这个命令,那就先安装ssh-server

3.安装jdk

cd /home

mkdir jdk 

chown hadoop:hadoop jdk-6u13-linux-i586.bin //先解决权限问题

然后把 jdk-6u13-linux-i586.bin 放到 jdk文件夹

cd jdk && ./jdk-6u13-linux-i586.bin  //完成jdk的安装,下面是配置环境

vi /etc/profile 进入编辑模式,在文件的最后面添加如下:

export PATH=.:/home/jdk/jdk1.6.0_13/bin:/home/jdk/jdk1.6.0_13/jre/bin:/home/hadoop-0.22.0/bin:$PATH
export CLASSPATH=.:/home/jdk/jdk1.6.0_13/lib:/home/jdk/jdk1.6.0_13/jre/lib:$CLASSPATH

然后重启下

4.配置hadoop

重启

登陆hadoop,这里说下,我远程登陆的虚拟机,所以在pc机上只要系统运行就可以,我也没有登陆,但是在笔记本上我用SecureCrt,可以登陆任何一个账号,然后后测试下java看看能用不,具体方法不说了。

将hadoop-0.22.0.tar.gz 拷贝到/home 下

tar -zxf hadoop-0.22.0.tar.gz

sudo chown -R hadoop:hadoop hadoop-0.22.0

配置hadoop-env.sh

vi /home/hadoop-0.22.0/conf/hadoop-env.sh

找到

# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.6-sun

将下面的一句改成或者直接添加
export JAVA_HOME=/home/jdk/jdk1.6.0_13

配置core-site.xml

在<configuration>与</configuration>中间添加如下

        <property>
                <name>fs.default.name</name>
                <value>hdfs://127.0.0.1:9000</value>
        </property>
        <property>
                <name>dfs.replication</name>                    
                <value>1</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>                    
                <value>/home/hadoop/tmp</value>//临时目录,需要自己去hadoop的家目录/home/hadoop 去创建,也可以自主指定,无限制
        </property>

配置mapred-site.xml

在<configuration>与</configuration>中间添加如下

        <property>
                <name>mapred.job.tracker</name>                    
                <value>127.0.0.1:9001</value>
        </property>

5.运行hadoop

格式化HDFS

hadoop namenode -format //格式化HDFS

[hadoop@localhost conf]$ hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

12/11/20 20:56:53 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.22.0
STARTUP_MSG:   classpath = /home/hadoop-0.22.0/bin/../conf:/home/jdk/jdk1.6.0_13/lib/tools.jar:/home/hadoop-0.22.0/bin/..:/home/hadoop-0.22.0/bin/../hadoop-common-0.22.0.jar:/home/hadoop-0.2r
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22/common -r 1207774; compiled by 'jenkins' on Sun Dec  4 00:57:22 UTC 2011
************************************************************/
12/11/20 20:56:58 INFO namenode.FSNamesystem: defaultReplication = 3
12/11/20 20:56:58 INFO namenode.FSNamesystem: maxReplication = 512
12/11/20 20:56:58 INFO namenode.FSNamesystem: minReplication = 1
12/11/20 20:56:58 INFO namenode.FSNamesystem: maxReplicationStreams = 2
12/11/20 20:56:58 INFO namenode.FSNamesystem: shouldCheckForEnoughRacks = false
12/11/20 20:56:58 INFO util.GSet: VM type       = 32-bit
12/11/20 20:56:58 INFO util.GSet: 2% max memory = 19.84625 MB
12/11/20 20:56:58 INFO util.GSet: capacity      = 2^22 = 4194304 entries
12/11/20 20:56:58 INFO util.GSet: recommended=4194304, actual=4194304
12/11/20 20:58:04 INFO namenode.FSNamesystem: fsOwner=hadoop
12/11/20 20:58:04 INFO namenode.FSNamesystem: supergroup=supergroup
12/11/20 20:58:04 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/11/20 20:58:04 INFO namenode.FSNamesystem: isBlockTokenEnabled=false blockKeyUpdateInterval=0 min(s), blockTokenLifetime=0 min(s)
12/11/20 20:58:05 INFO namenode.NameNode: Caching file names occuring more than 10 times 
12/11/20 20:58:09 INFO common.Storage: Saving image file /home/hadoop/tmp/dfs/name/current/fsimage using no compression
12/11/20 20:58:20 INFO common.Storage: Image file of size 113 saved in 11 seconds.
12/11/20 20:58:22 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
12/11/20 20:58:22 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
************************************************************/
出现如上所示,表示成功,或者可以通过 echo $? 进行测试,结果为0表示成功,非0表示失败,脚本知识不解释~

启动hadoop

/home/hadoop-0.22.0/bin/start-all.sh 

[hadoop@localhost ~]$ jps
4624 Jps
[hadoop@localhost ~]$ /home/hadoop-0.22.0/bin/start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh
starting namenode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-namenode-localhost.out
localhost: starting datanode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-datanode-localhost.out
localhost: starting secondarynamenode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-secondarynamenode-localhost.out
[hadoop@localhost ~]$ jps
5116 Jps
4895 DataNode
5073 SecondaryNameNode
4736 NameNode
[hadoop@localhost ~]$ 

通过jps查看,ob和task开启失败,那可以通过 /home/hadoop-0.22.0/bin/start-mapred.sh 进行开启,我在安装中就遇到了,我就借用的这个命令开启的job和task

到此为止,hadoop安装完毕且成功

对比下

[hadoop@localhost ~]$ jps
4624 Jps
[hadoop@localhost ~]$ /home/hadoop-0.22.0/bin/start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh
starting namenode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-namenode-localhost.out
localhost: starting datanode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-datanode-localhost.out
localhost: starting secondarynamenode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-secondarynamenode-localhost.out
[hadoop@localhost ~]$ jps
5116 Jps
4895 DataNode
5073 SecondaryNameNode
4736 NameNode
[hadoop@localhost ~]$  /home/hadoop-0.22.0/bin/start-mapred.sh
starting jobtracker, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-jobtracker-localhost.out
localhost: starting tasktracker, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-tasktracker-localhost.out
[hadoop@localhost ~]$ jps
5231 JobTracker
5386 TaskTracker
5503 Jps
4895 DataNode
5073 SecondaryNameNode
4736 NameNode
[hadoop@localhost ~

6.wordcount测试

wordcout是hadoop自带的一个小demo,可以通过它测试下hadoop,这个例子是统计一个文本文件内,单词的个数的程序

[hadoop@localhost hadoop-0.22.0]$ ll
total 10360
drwxr-xr-x  2 hadoop hadoop    4096 Dec  4  2011 bin
drwxr-xr-x  4 hadoop hadoop    4096 Dec  4  2011 c++
drwxr-xr-x  9 hadoop hadoop    4096 Dec  4  2011 common
drwxr-xr-x  2 hadoop hadoop    4096 Nov 20 20:55 conf
drwxr-xr-x 18 hadoop hadoop    4096 Dec  4  2011 contrib
-rw-r--r--  1 hadoop hadoop 1387999 Dec  4  2011 hadoop-common-0.22.0.jar
-rw-r--r--  1 hadoop hadoop  729864 Dec  4  2011 hadoop-common-test-0.22.0.jar
-rw-r--r--  1 hadoop hadoop 1050571 Dec  4  2011 hadoop-hdfs-0.22.0.jar
-rw-r--r--  1 hadoop hadoop  677611 Dec  4  2011 hadoop-hdfs-0.22.0-sources.jar
-rw-r--r--  1 hadoop hadoop    7388 Dec  4  2011 hadoop-hdfs-ant-0.22.0.jar
-rw-r--r--  1 hadoop hadoop  838290 Dec  4  2011 hadoop-hdfs-test-0.22.0.jar
-rw-r--r--  1 hadoop hadoop  505106 Dec  4  2011 hadoop-hdfs-test-0.22.0-sources.jar
-rw-r--r--  1 hadoop hadoop 1785424 Dec  4  2011 hadoop-mapred-0.22.0.jar
-rw-r--r--  1 hadoop hadoop 1203926 Dec  4  2011 hadoop-mapred-0.22.0-sources.jar
-rw-r--r--  1 hadoop hadoop  252068 Dec  4  2011 hadoop-mapred-examples-0.22.0.jar
-rw-r--r--  1 hadoop hadoop 1636106 Dec  4  2011 hadoop-mapred-test-0.22.0.jar
-rw-r--r--  1 hadoop hadoop  300534 Dec  4  2011 hadoop-mapred-tools-0.22.0.jar
drwxr-xr-x  9 hadoop hadoop    4096 Dec  4  2011 hdfs
drwxr-xr-x  4 hadoop hadoop    4096 Dec  4  2011 lib
-rw-r--r--  1 hadoop hadoop   13366 Dec  4  2011 LICENSE.txt
drwxrwxr-x  2 hadoop hadoop    4096 Nov 20 21:01 logs
drwxr-xr-x 10 hadoop hadoop    4096 Dec  4  2011 mapreduce
-rw-r--r--  1 hadoop hadoop     101 Dec  4  2011 NOTICE.txt
-rw-r--r--  1 hadoop hadoop    1366 Dec  4  2011 README.txt
drwxr-xr-x  8 hadoop hadoop    4096 Dec  4  2011 webapps
[hadoop@localhost hadoop-0.22.0]$ mkdir input
[hadoop@localhost hadoop-0.22.0]$ cp *.txt input/
[hadoop@localhost hadoop-0.22.0]$ pwd
/home/hadoop-0.22.0
[hadoop@localhost hadoop-0.22.0]$ 
测试下hadoop-0.22.0下*.txt文件内单词的个数,偷懒的作法,:-D,

建立线上目录并上传

然后,将本地目录input上传到HDFS文件系统上,执行如下命令:

[hadoop@localhost hadoop-0.22.0]$ hadoop fs -mkdir input
[hadoop@localhost hadoop-0.22.0]$ hadoop fs -put input input
[hadoop@localhost hadoop-0.22.0]$ echo $?
0
[hadoop@localhost hadoop-0.22.0]$ 
上面的操作依次是建立目录,然后将本地目录上传到线上目录

运行hadoop

[hadoop@localhost hadoop-0.22.0]$ hadoop fs -put input/ input
[hadoop@localhost hadoop-0.22.0]$ hadoop jar /home/hadoop-0.22.0/hadoop-mapred-examples-0.22.0.jar wordcount input output
12/11/20 22:07:33 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/11/20 22:07:33 INFO input.FileInputFormat: Total input paths to process : 3
12/11/20 22:07:33 INFO mapreduce.JobSubmitter: number of splits:3
12/11/20 22:07:34 INFO mapreduce.Job: Running job: job_201211202206_0001
12/11/20 22:07:35 INFO mapreduce.Job:  map 0% reduce 0%
12/11/20 22:07:49 INFO mapreduce.Job:  map 33% reduce 0%
12/11/20 22:07:54 INFO mapreduce.Job:  map 66% reduce 0%
12/11/20 22:07:57 INFO mapreduce.Job:  map 100% reduce 0%
12/11/20 22:08:03 INFO mapreduce.Job:  map 100% reduce 100%
12/11/20 22:08:05 INFO mapreduce.Job: Job complete: job_201211202206_0001
12/11/20 22:08:05 INFO mapreduce.Job: Counters: 36
        FileInputFormatCounters
                BYTES_READ=14833
        FileSystemCounters
                FILE_BYTES_READ=12203
                FILE_BYTES_WRITTEN=24514
                HDFS_BYTES_READ=15179
                HDFS_BYTES_WRITTEN=8355
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        Job Counters 
                Data-local map tasks=3
                Total time spent by all maps waiting after reserving slots (ms)=0
                Total time spent by all reduces waiting after reserving slots (ms)=0
                SLOTS_MILLIS_MAPS=24802
                SLOTS_MILLIS_REDUCES=11710
                Launched map tasks=3
                Launched reduce tasks=1
        Map-Reduce Framework
                Combine input records=2077
                Combine output records=856
                CPU_MILLISECONDS=2270
                Failed Shuffles=0
                GC time elapsed (ms)=292
                Map input records=277
                Map output bytes=21899
                Map output records=2077
                Merged Map outputs=3
                PHYSICAL_MEMORY_BYTES=450105344
                Reduce input groups=799
                Reduce input records=856
                Reduce output records=799
                Reduce shuffle bytes=12215
                Shuffled Maps =3
                Spilled Records=1712
                SPLIT_RAW_BYTES=346
                VIRTUAL_MEMORY_BYTES=1487777792
[hadoop@localhost hadoop-0.22.0]$ 
运行过程如上所示

结果如下图所示:

[hadoop@localhost hadoop-0.22.0]$ hadoop fs -cat output/* |less 
"AS     3
"Contribution"  1
"Contributor"   1
"Derivative     1
"Legal  1
"License"       1
"License");     1
"Licensor"      1
"NOTICE"        1
"Not    1
"Object"        1
"Source"        1
"Work"  1
"You"   1
"Your") 1
"[]"    1
"control"       1
"printed        1
"submitted"     1
(50%)   1
(BIS),  1
(Don't  1
(ECCN)  1
(INCLUDING      1
部分截图:

BUSINESS        1
BUT     2
BY      1
Bureau  1
CAUSED  1
CONDITIONS      4
CONSEQUENTIAL   1
CONTRACT,       1
CONTRIBUTORS    2
COPYRIGHT       2
Catholique      1
Commerce,       1
Commission      1
Commodity       1
Contribution    3
Contribution(s) 3
Contribution."  1
Contributions)  1
Contributions.  2
Contributor     8

OK到此为止,hadoop单机模式配置完成,运行也OK,一路走来遇到了很多困难,虚拟机也挂了好几次,注意,我的程序全都安装在/home下而不是/usr/下,主要原因如下

[hadoop@localhost hadoop-0.22.0]$ 
[hadoop@localhost hadoop-0.22.0]$ 
[hadoop@localhost hadoop-0.22.0]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             3.8G  3.1G  539M  86% /
/dev/sda5             5.9G  1.2G  4.5G  21% /home
/dev/sda1              46M   11M   33M  25% /boot
tmpfs                 454M     0  454M   0% /dev/shm
[hadoop@localhost hadoop-0.22.0]$ 

装完后,就木有空间了,所以。。。。,等着,还有下一篇,hadoop局域网部署真分布式,下下篇等等~~~


你可能感兴趣的:([置顶] hadoop单机部署)