废话少说,以下是我的安装过程。
Pc是学校的台式机,32位E4600CPU,1G内存,自己的笔记本
虚拟机centos5.8 32位、jdk-6u13-linux-i586.bin、hadoop-0.22.0.tar.gz、SecureCrt(远程登陆PC)、百度(外网只能上csdn、cnblog,其余是百度快照)
闲着无聊,想把学校的电脑搭成hadoop,电脑太烂,登陆太慢,只用通过自己电脑ssh连接。方法参看http://blog.csdn.net/xluren/article/details/8197183
groupadd hadoop//添加组
useradd hadoop -d /home/hadoop -g hadoop//添加用户和组,设置家目录
passwd hadoop//设置密码
这一步是配置ssh信任关系,具体的命令如下:
ssh-keygen -t rsa 然后一直敲回车直到出现命令提示符
[fy@localhost ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/fy/.ssh/id_rsa): /home/fy/.ssh/id_rsa already exists. Overwrite (y/n)? y Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/fy/.ssh/id_rsa. Your public key has been saved in /home/fy/.ssh/id_rsa.pub. The key fingerprint is: 2d:b2:0d:4c:21:aa:72:6d:42:6a:92:e8:4d:a5:be:71 fy@localhost [fy@localhost ~]$具体的情形如上图所示,这步基本不会出错,如果没有这个命令,那就先安装ssh-server
cd /home
mkdir jdk
chown hadoop:hadoop jdk-6u13-linux-i586.bin //先解决权限问题
然后把 jdk-6u13-linux-i586.bin 放到 jdk文件夹
cd jdk && ./jdk-6u13-linux-i586.bin //完成jdk的安装,下面是配置环境
vi /etc/profile 进入编辑模式,在文件的最后面添加如下:
export PATH=.:/home/jdk/jdk1.6.0_13/bin:/home/jdk/jdk1.6.0_13/jre/bin:/home/hadoop-0.22.0/bin:$PATH
export CLASSPATH=.:/home/jdk/jdk1.6.0_13/lib:/home/jdk/jdk1.6.0_13/jre/lib:$CLASSPATH
然后重启下
登陆hadoop,这里说下,我远程登陆的虚拟机,所以在pc机上只要系统运行就可以,我也没有登陆,但是在笔记本上我用SecureCrt,可以登陆任何一个账号,然后后测试下java看看能用不,具体方法不说了。
将hadoop-0.22.0.tar.gz 拷贝到/home 下
tar -zxf hadoop-0.22.0.tar.gz
sudo chown -R hadoop:hadoop hadoop-0.22.0
vi /home/hadoop-0.22.0/conf/hadoop-env.sh
找到
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.6-sun
将下面的一句改成或者直接添加
export JAVA_HOME=/home/jdk/jdk1.6.0_13
在<configuration>与</configuration>中间添加如下
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>//临时目录,需要自己去hadoop的家目录/home/hadoop 去创建,也可以自主指定,无限制
</property>
在<configuration>与</configuration>中间添加如下
<property>
<name>mapred.job.tracker</name>
<value>127.0.0.1:9001</value>
</property>
hadoop namenode -format //格式化HDFS
[hadoop@localhost conf]$ hadoop namenode -format DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 12/11/20 20:56:53 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost.localdomain/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.22.0 STARTUP_MSG: classpath = /home/hadoop-0.22.0/bin/../conf:/home/jdk/jdk1.6.0_13/lib/tools.jar:/home/hadoop-0.22.0/bin/..:/home/hadoop-0.22.0/bin/../hadoop-common-0.22.0.jar:/home/hadoop-0.2r STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.22/common -r 1207774; compiled by 'jenkins' on Sun Dec 4 00:57:22 UTC 2011 ************************************************************/ 12/11/20 20:56:58 INFO namenode.FSNamesystem: defaultReplication = 3 12/11/20 20:56:58 INFO namenode.FSNamesystem: maxReplication = 512 12/11/20 20:56:58 INFO namenode.FSNamesystem: minReplication = 1 12/11/20 20:56:58 INFO namenode.FSNamesystem: maxReplicationStreams = 2 12/11/20 20:56:58 INFO namenode.FSNamesystem: shouldCheckForEnoughRacks = false 12/11/20 20:56:58 INFO util.GSet: VM type = 32-bit 12/11/20 20:56:58 INFO util.GSet: 2% max memory = 19.84625 MB 12/11/20 20:56:58 INFO util.GSet: capacity = 2^22 = 4194304 entries 12/11/20 20:56:58 INFO util.GSet: recommended=4194304, actual=4194304 12/11/20 20:58:04 INFO namenode.FSNamesystem: fsOwner=hadoop 12/11/20 20:58:04 INFO namenode.FSNamesystem: supergroup=supergroup 12/11/20 20:58:04 INFO namenode.FSNamesystem: isPermissionEnabled=true 12/11/20 20:58:04 INFO namenode.FSNamesystem: isBlockTokenEnabled=false blockKeyUpdateInterval=0 min(s), blockTokenLifetime=0 min(s) 12/11/20 20:58:05 INFO namenode.NameNode: Caching file names occuring more than 10 times 12/11/20 20:58:09 INFO common.Storage: Saving image file /home/hadoop/tmp/dfs/name/current/fsimage using no compression 12/11/20 20:58:20 INFO common.Storage: Image file of size 113 saved in 11 seconds. 12/11/20 20:58:22 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted. 12/11/20 20:58:22 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1 ************************************************************/出现如上所示,表示成功,或者可以通过 echo $? 进行测试,结果为0表示成功,非0表示失败,脚本知识不解释~
/home/hadoop-0.22.0/bin/start-all.sh
[hadoop@localhost ~]$ jps 4624 Jps [hadoop@localhost ~]$ /home/hadoop-0.22.0/bin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh starting namenode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-namenode-localhost.out localhost: starting datanode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-datanode-localhost.out localhost: starting secondarynamenode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-secondarynamenode-localhost.out [hadoop@localhost ~]$ jps 5116 Jps 4895 DataNode 5073 SecondaryNameNode 4736 NameNode [hadoop@localhost ~]$
通过jps查看,ob和task开启失败,那可以通过 /home/hadoop-0.22.0/bin/start-mapred.sh 进行开启,我在安装中就遇到了,我就借用的这个命令开启的job和task
到此为止,hadoop安装完毕且成功
对比下
[hadoop@localhost ~]$ jps 4624 Jps [hadoop@localhost ~]$ /home/hadoop-0.22.0/bin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh starting namenode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-namenode-localhost.out localhost: starting datanode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-datanode-localhost.out localhost: starting secondarynamenode, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-secondarynamenode-localhost.out [hadoop@localhost ~]$ jps 5116 Jps 4895 DataNode 5073 SecondaryNameNode 4736 NameNode [hadoop@localhost ~]$ /home/hadoop-0.22.0/bin/start-mapred.sh starting jobtracker, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-jobtracker-localhost.out localhost: starting tasktracker, logging to /home/hadoop-0.22.0/bin/../logs/hadoop-hadoop-tasktracker-localhost.out [hadoop@localhost ~]$ jps 5231 JobTracker 5386 TaskTracker 5503 Jps 4895 DataNode 5073 SecondaryNameNode 4736 NameNode [hadoop@localhost ~
wordcout是hadoop自带的一个小demo,可以通过它测试下hadoop,这个例子是统计一个文本文件内,单词的个数的程序
[hadoop@localhost hadoop-0.22.0]$ ll total 10360 drwxr-xr-x 2 hadoop hadoop 4096 Dec 4 2011 bin drwxr-xr-x 4 hadoop hadoop 4096 Dec 4 2011 c++ drwxr-xr-x 9 hadoop hadoop 4096 Dec 4 2011 common drwxr-xr-x 2 hadoop hadoop 4096 Nov 20 20:55 conf drwxr-xr-x 18 hadoop hadoop 4096 Dec 4 2011 contrib -rw-r--r-- 1 hadoop hadoop 1387999 Dec 4 2011 hadoop-common-0.22.0.jar -rw-r--r-- 1 hadoop hadoop 729864 Dec 4 2011 hadoop-common-test-0.22.0.jar -rw-r--r-- 1 hadoop hadoop 1050571 Dec 4 2011 hadoop-hdfs-0.22.0.jar -rw-r--r-- 1 hadoop hadoop 677611 Dec 4 2011 hadoop-hdfs-0.22.0-sources.jar -rw-r--r-- 1 hadoop hadoop 7388 Dec 4 2011 hadoop-hdfs-ant-0.22.0.jar -rw-r--r-- 1 hadoop hadoop 838290 Dec 4 2011 hadoop-hdfs-test-0.22.0.jar -rw-r--r-- 1 hadoop hadoop 505106 Dec 4 2011 hadoop-hdfs-test-0.22.0-sources.jar -rw-r--r-- 1 hadoop hadoop 1785424 Dec 4 2011 hadoop-mapred-0.22.0.jar -rw-r--r-- 1 hadoop hadoop 1203926 Dec 4 2011 hadoop-mapred-0.22.0-sources.jar -rw-r--r-- 1 hadoop hadoop 252068 Dec 4 2011 hadoop-mapred-examples-0.22.0.jar -rw-r--r-- 1 hadoop hadoop 1636106 Dec 4 2011 hadoop-mapred-test-0.22.0.jar -rw-r--r-- 1 hadoop hadoop 300534 Dec 4 2011 hadoop-mapred-tools-0.22.0.jar drwxr-xr-x 9 hadoop hadoop 4096 Dec 4 2011 hdfs drwxr-xr-x 4 hadoop hadoop 4096 Dec 4 2011 lib -rw-r--r-- 1 hadoop hadoop 13366 Dec 4 2011 LICENSE.txt drwxrwxr-x 2 hadoop hadoop 4096 Nov 20 21:01 logs drwxr-xr-x 10 hadoop hadoop 4096 Dec 4 2011 mapreduce -rw-r--r-- 1 hadoop hadoop 101 Dec 4 2011 NOTICE.txt -rw-r--r-- 1 hadoop hadoop 1366 Dec 4 2011 README.txt drwxr-xr-x 8 hadoop hadoop 4096 Dec 4 2011 webapps [hadoop@localhost hadoop-0.22.0]$ mkdir input [hadoop@localhost hadoop-0.22.0]$ cp *.txt input/ [hadoop@localhost hadoop-0.22.0]$ pwd /home/hadoop-0.22.0 [hadoop@localhost hadoop-0.22.0]$测试下hadoop-0.22.0下*.txt文件内单词的个数,偷懒的作法,:-D,
然后,将本地目录input上传到HDFS文件系统上,执行如下命令:
[hadoop@localhost hadoop-0.22.0]$ hadoop fs -mkdir input [hadoop@localhost hadoop-0.22.0]$ hadoop fs -put input input [hadoop@localhost hadoop-0.22.0]$ echo $? 0 [hadoop@localhost hadoop-0.22.0]$上面的操作依次是建立目录,然后将本地目录上传到线上目录
[hadoop@localhost hadoop-0.22.0]$ hadoop fs -put input/ input [hadoop@localhost hadoop-0.22.0]$ hadoop jar /home/hadoop-0.22.0/hadoop-mapred-examples-0.22.0.jar wordcount input output 12/11/20 22:07:33 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 12/11/20 22:07:33 INFO input.FileInputFormat: Total input paths to process : 3 12/11/20 22:07:33 INFO mapreduce.JobSubmitter: number of splits:3 12/11/20 22:07:34 INFO mapreduce.Job: Running job: job_201211202206_0001 12/11/20 22:07:35 INFO mapreduce.Job: map 0% reduce 0% 12/11/20 22:07:49 INFO mapreduce.Job: map 33% reduce 0% 12/11/20 22:07:54 INFO mapreduce.Job: map 66% reduce 0% 12/11/20 22:07:57 INFO mapreduce.Job: map 100% reduce 0% 12/11/20 22:08:03 INFO mapreduce.Job: map 100% reduce 100% 12/11/20 22:08:05 INFO mapreduce.Job: Job complete: job_201211202206_0001 12/11/20 22:08:05 INFO mapreduce.Job: Counters: 36 FileInputFormatCounters BYTES_READ=14833 FileSystemCounters FILE_BYTES_READ=12203 FILE_BYTES_WRITTEN=24514 HDFS_BYTES_READ=15179 HDFS_BYTES_WRITTEN=8355 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 Job Counters Data-local map tasks=3 Total time spent by all maps waiting after reserving slots (ms)=0 Total time spent by all reduces waiting after reserving slots (ms)=0 SLOTS_MILLIS_MAPS=24802 SLOTS_MILLIS_REDUCES=11710 Launched map tasks=3 Launched reduce tasks=1 Map-Reduce Framework Combine input records=2077 Combine output records=856 CPU_MILLISECONDS=2270 Failed Shuffles=0 GC time elapsed (ms)=292 Map input records=277 Map output bytes=21899 Map output records=2077 Merged Map outputs=3 PHYSICAL_MEMORY_BYTES=450105344 Reduce input groups=799 Reduce input records=856 Reduce output records=799 Reduce shuffle bytes=12215 Shuffled Maps =3 Spilled Records=1712 SPLIT_RAW_BYTES=346 VIRTUAL_MEMORY_BYTES=1487777792 [hadoop@localhost hadoop-0.22.0]$运行过程如上所示
[hadoop@localhost hadoop-0.22.0]$ hadoop fs -cat output/* |less "AS 3 "Contribution" 1 "Contributor" 1 "Derivative 1 "Legal 1 "License" 1 "License"); 1 "Licensor" 1 "NOTICE" 1 "Not 1 "Object" 1 "Source" 1 "Work" 1 "You" 1 "Your") 1 "[]" 1 "control" 1 "printed 1 "submitted" 1 (50%) 1 (BIS), 1 (Don't 1 (ECCN) 1 (INCLUDING 1部分截图:
BUSINESS 1 BUT 2 BY 1 Bureau 1 CAUSED 1 CONDITIONS 4 CONSEQUENTIAL 1 CONTRACT, 1 CONTRIBUTORS 2 COPYRIGHT 2 Catholique 1 Commerce, 1 Commission 1 Commodity 1 Contribution 3 Contribution(s) 3 Contribution." 1 Contributions) 1 Contributions. 2 Contributor 8
[hadoop@localhost hadoop-0.22.0]$ [hadoop@localhost hadoop-0.22.0]$ [hadoop@localhost hadoop-0.22.0]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 3.8G 3.1G 539M 86% / /dev/sda5 5.9G 1.2G 4.5G 21% /home /dev/sda1 46M 11M 33M 25% /boot tmpfs 454M 0 454M 0% /dev/shm [hadoop@localhost hadoop-0.22.0]$