Now,let us begin to install the hadoop on your linux 5 (RHEL5)
firstly
we should start ssh service inside your Linux
[root@localhost conf]# service sshd start
Starting sshd: [ OK ]
create a folder in usr
[root@localhost usr]# mkdir hadoop
then put the essential sources into the folder hadoop:
hadoop-1.0.0.tar.gz,you can download from http://hadoop.apache.org/
then change temportatry to usr/hadoop
[root@localhost hadoop]# tar –zxvf hadoop-1.0.0.tar.gz we use this command to tar it
then you will see another folder,
[root@localhost hadoop]# cd hadoop-1.0.0
[root@localhost hadoop-1.0.0]# cd conf
[root@localhost conf]# ls
capacity-scheduler.xml hadoop-policy.xml slaves
configuration.xsl hdfs-site.xml ssl-client.xml.example
core-site.xml log4j.properties ssl-server.xml.example
fair-scheduler.xml mapred-queue-acls.xml taskcontroller.cfg
hadoop-env.sh mapred-site.xml
hadoop-metrics2.properties masters
[root@localhost conf]# vi hadoop-env.sh
we will see and modify the file,pay attention to the bold sentences!!!
#Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/usr/java/jdk1.6.0_20
# Extra Java CLASSPATH elements. Optional.
export HADOOP_CLASSPATH=/usr/hadoop/hadoop-1.0.0
export PATH=$PATH:/usr/hadoop/hadoop-1.0.0/
# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000
# Extra Java runtime options. Empty by default.
# export HADOOP_OPTS=-server
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
[root@localhost conf]# source hadoop-env.sh
[root@localhost conf]#
[root@localhost conf]# cd ..
[root@localhost hadoop-1.0.0]# bin/hadoop
we can use this to check whether the file is modified correctly…
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
mradmin run a Map-Reduce admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer run a cluster balancing utility
fetchdt fetch a delegation token from the NameNode
jobtracker run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker run a MapReduce task Tracker node
historyserver run job history servers as a standalone daemon
job manipulate MapReduce jobs
queue get information regarding JobQueues
version print the version
jar <jar> run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
[root@localhost hadoop-1.0.0]#
[root@localhost hadoop-1.0.0]#
then
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
then modify the configuration
[root@localhost conf]# vi core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
[root@localhost conf]# vi hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
[root@localhost conf]# vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
create namenode
[root@localhost hadoop-1.0.0]# bin/hadoop namenode -format
12/04/08 12:54:48 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost.localdomain/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.0
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1214675; compiled by 'hortonfo' on Thu Dec 15 16:36:35 UTC 2011
************************************************************/
12/04/08 12:54:49 INFO util.GSet: VM type = 32-bit
12/04/08 12:54:49 INFO util.GSet: 2% max memory = 19.33375 MB
12/04/08 12:54:49 INFO util.GSet: capacity = 2^22 = 4194304 entries
12/04/08 12:54:49 INFO util.GSet: recommended=4194304, actual=4194304
12/04/08 12:54:52 INFO namenode.FSNamesystem: fsOwner=root
12/04/08 12:54:53 INFO namenode.FSNamesystem: supergroup=supergroup
12/04/08 12:54:53 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/04/08 12:54:53 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/04/08 12:54:53 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/04/08 12:54:53 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/04/08 12:54:54 INFO common.Storage: Image file of size 110 saved in 0 seconds.
12/04/08 12:54:54 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
12/04/08 12:54:54 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
************************************************************/
[root@localhost hadoop-1.0.0]#
go to explorer to check the log
http://localhost:50070/
http://localhost:50030/
Copy the input files into the distributed filesystem:
[root@localhost hadoop-1.0.0]# bin/hadoop fs -put conf input
Run some of the examples provided:
[root@localhost hadoop-1.0.0]# bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
12/04/08 13:53:16 INFO mapred.FileInputFormat: Total input paths to process : 16
12/04/08 13:53:17 INFO mapred.JobClient: Running job: job_201204081256_0001
12/04/08 13:53:18 INFO mapred.JobClient: map 0% reduce 0%
12/04/08 13:53:58 INFO mapred.JobClient: map 6% reduce 0%
12/04/08 13:54:01 INFO mapred.JobClient: map 12% reduce 0%
12/04/08 13:54:17 INFO mapred.JobClient: map 25% reduce 0%
12/04/08 13:54:36 INFO mapred.JobClient: map 31% reduce 0%
12/04/08 13:54:41 INFO mapred.JobClient: map 37% reduce 8%
12/04/08 13:54:45 INFO mapred.JobClient: map 43% reduce 8%
12/04/08 13:54:49 INFO mapred.JobClient: map 50% reduce 12%
12/04/08 13:54:53 INFO mapred.JobClient: map 56% reduce 12%
12/04/08 13:54:56 INFO mapred.JobClient: map 62% reduce 12%
12/04/08 13:54:59 INFO mapred.JobClient: map 68% reduce 16%
12/04/08 13:55:03 INFO mapred.JobClient: map 75% reduce 16%
12/04/08 13:55:06 INFO mapred.JobClient: map 81% reduce 20%
12/04/08 13:55:09 INFO mapred.JobClient: map 87% reduce 20%
12/04/08 13:55:13 INFO mapred.JobClient: map 93% reduce 27%
12/04/08 13:55:16 INFO mapred.JobClient: map 100% reduce 27%
12/04/08 13:55:22 INFO mapred.JobClient: map 100% reduce 31%
12/04/08 13:55:28 INFO mapred.JobClient: map 100% reduce 100%
12/04/08 13:55:35 INFO mapred.JobClient: Job complete: job_201204081256_0001
12/04/08 13:55:35 INFO mapred.JobClient: Counters: 30
12/04/08 13:55:35 INFO mapred.JobClient: Job Counters
12/04/08 13:55:35 INFO mapred.JobClient: Launched reduce tasks=1
12/04/08 13:55:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=175469
12/04/08 13:55:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/04/08 13:55:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/04/08 13:55:35 INFO mapred.JobClient: Launched map tasks=16
12/04/08 13:55:35 INFO mapred.JobClient: Data-local map tasks=16
12/04/08 13:55:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=87301
12/04/08 13:55:35 INFO mapred.JobClient: File Input Format Counters
12/04/08 13:55:35 INFO mapred.JobClient: Bytes Read=26846
12/04/08 13:55:35 INFO mapred.JobClient: File Output Format Counters
12/04/08 13:55:35 INFO mapred.JobClient: Bytes Written=180
12/04/08 13:55:35 INFO mapred.JobClient: FileSystemCounters
12/04/08 13:55:35 INFO mapred.JobClient: FILE_BYTES_READ=82
12/04/08 13:55:35 INFO mapred.JobClient: HDFS_BYTES_READ=28568
12/04/08 13:55:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=367514
12/04/08 13:55:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=180
12/04/08 13:55:35 INFO mapred.JobClient: Map-Reduce Framework
12/04/08 13:55:35 INFO mapred.JobClient: Map output materialized bytes=172
12/04/08 13:55:35 INFO mapred.JobClient: Map input records=760
12/04/08 13:55:35 INFO mapred.JobClient: Reduce shuffle bytes=172
12/04/08 13:55:35 INFO mapred.JobClient: Spilled Records=6
12/04/08 13:55:35 INFO mapred.JobClient: Map output bytes=70
12/04/08 13:55:35 INFO mapred.JobClient: Total committed heap usage (bytes)=3252289536
12/04/08 13:55:35 INFO mapred.JobClient: CPU time spent (ms)=22930
12/04/08 13:55:35 INFO mapred.JobClient: Map input bytes=26846
12/04/08 13:55:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=1722
12/04/08 13:55:35 INFO mapred.JobClient: Combine input records=3
12/04/08 13:55:35 INFO mapred.JobClient: Reduce input records=3
12/04/08 13:55:35 INFO mapred.JobClient: Reduce input groups=3
12/04/08 13:55:35 INFO mapred.JobClient: Combine output records=3
12/04/08 13:55:35 INFO mapred.JobClient: Physical memory (bytes) snapshot=2292494336
12/04/08 13:55:35 INFO mapred.JobClient: Reduce output records=3
12/04/08 13:55:35 INFO mapred.JobClient: Virtual memory (bytes) snapshot=6338478080
12/04/08 13:55:35 INFO mapred.JobClient: Map output records=3
12/04/08 13:55:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/04/08 13:55:37 INFO mapred.FileInputFormat: Total input paths to process : 1
12/04/08 13:55:38 INFO mapred.JobClient: Running job: job_201204081256_0002
12/04/08 13:55:39 INFO mapred.JobClient: map 0% reduce 0%
12/04/08 13:55:55 INFO mapred.JobClient: map 100% reduce 0%
12/04/08 13:56:11 INFO mapred.JobClient: map 100% reduce 100%
12/04/08 13:56:16 INFO mapred.JobClient: Job complete: job_201204081256_0002
12/04/08 13:56:16 INFO mapred.JobClient: Counters: 30
12/04/08 13:56:16 INFO mapred.JobClient: Job Counters
12/04/08 13:56:16 INFO mapred.JobClient: Launched reduce tasks=1
12/04/08 13:56:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=13535
12/04/08 13:56:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/04/08 13:56:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/04/08 13:56:16 INFO mapred.JobClient: Launched map tasks=1
12/04/08 13:56:16 INFO mapred.JobClient: Data-local map tasks=1
12/04/08 13:56:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14047
12/04/08 13:56:16 INFO mapred.JobClient: File Input Format Counters
12/04/08 13:56:16 INFO mapred.JobClient: Bytes Read=180
12/04/08 13:56:16 INFO mapred.JobClient: File Output Format Counters
12/04/08 13:56:16 INFO mapred.JobClient: Bytes Written=52
12/04/08 13:56:16 INFO mapred.JobClient: FileSystemCounters
12/04/08 13:56:16 INFO mapred.JobClient: FILE_BYTES_READ=82
12/04/08 13:56:16 INFO mapred.JobClient: HDFS_BYTES_READ=295
12/04/08 13:56:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=42409
12/04/08 13:56:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=52
12/04/08 13:56:16 INFO mapred.JobClient: Map-Reduce Framework
12/04/08 13:56:16 INFO mapred.JobClient: Map output materialized bytes=82
12/04/08 13:56:16 INFO mapred.JobClient: Map input records=3
12/04/08 13:56:16 INFO mapred.JobClient: Reduce shuffle bytes=82
12/04/08 13:56:16 INFO mapred.JobClient: Spilled Records=6
12/04/08 13:56:16 INFO mapred.JobClient: Map output bytes=70
12/04/08 13:56:16 INFO mapred.JobClient: Total committed heap usage (bytes)=210763776
12/04/08 13:56:16 INFO mapred.JobClient: CPU time spent (ms)=2270
12/04/08 13:56:16 INFO mapred.JobClient: Map input bytes=94
12/04/08 13:56:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=115
12/04/08 13:56:16 INFO mapred.JobClient: Combine input records=0
12/04/08 13:56:16 INFO mapred.JobClient: Reduce input records=3
12/04/08 13:56:16 INFO mapred.JobClient: Reduce input groups=1
12/04/08 13:56:16 INFO mapred.JobClient: Combine output records=0
12/04/08 13:56:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=179437568
12/04/08 13:56:16 INFO mapred.JobClient: Reduce output records=3
12/04/08 13:56:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=749031424
12/04/08 13:56:16 INFO mapred.JobClient: Map output records=3
[root@localhost hadoop-1.0.0]#
Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
[root@localhost hadoop-1.0.0]# bin/hadoop fs -get output output
[root@localhost hadoop-1.0.0]# cat output/*
cat: output/_logs: Is a directory
1 dfs.replication
1 dfs.server.namenode.
1 dfsadmin
[root@localhost hadoop-1.0.0]#
OK
The pseudo system is installed succefully