hadoop安装---linux5(RHEL5)

Now,let us begin to install the hadoop on your linux 5 (RHEL5)

 

firstly

we should start ssh service inside your Linux

 

[root@localhost conf]# service sshd start

Starting sshd:                                             [  OK  ]

 

create a folder in usr

[root@localhost usr]# mkdir hadoop

then put the essential sources into the folder hadoop:

hadoop-1.0.0.tar.gz,you can download from http://hadoop.apache.org/

then change temportatry to usr/hadoop

[root@localhost hadoop]# tar –zxvf  hadoop-1.0.0.tar.gz we use this command to tar it

then you will see another folder,

[root@localhost hadoop]# cd hadoop-1.0.0

[root@localhost hadoop-1.0.0]# cd conf

[root@localhost conf]# ls

capacity-scheduler.xml      hadoop-policy.xml      slaves

configuration.xsl           hdfs-site.xml          ssl-client.xml.example

core-site.xml               log4j.properties       ssl-server.xml.example

fair-scheduler.xml          mapred-queue-acls.xml  taskcontroller.cfg

hadoop-env.sh               mapred-site.xml

hadoop-metrics2.properties  masters

[root@localhost conf]# vi hadoop-env.sh

we will see and modify the file,pay attention to the bold sentences!!!

#Set Hadoop-specific environment variables here.

 

# The only required environment variable is JAVA_HOME.  All others are

# optional.  When running a distributed configuration it is best to

# set JAVA_HOME in this file, so that it is correctly defined on

# remote nodes.

 

# The java implementation to use.  Required.

  export JAVA_HOME=/usr/java/jdk1.6.0_20

 

# Extra Java CLASSPATH elements.  Optional.

  export HADOOP_CLASSPATH=/usr/hadoop/hadoop-1.0.0

  export PATH=$PATH:/usr/hadoop/hadoop-1.0.0/

# The maximum amount of heap to use, in MB. Default is 1000.

# export HADOOP_HEAPSIZE=2000

 

# Extra Java runtime options.  Empty by default.

# export HADOOP_OPTS=-server

 

# Command specific options appended to HADOOP_OPTS when specified

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"

 [root@localhost conf]# source hadoop-env.sh

[root@localhost conf]#

[root@localhost conf]# cd ..

[root@localhost hadoop-1.0.0]# bin/hadoop

we can use this to check whether the file is modified correctly…

Usage: hadoop [--config confdir] COMMAND

where COMMAND is one of:

  namenode -format     format the DFS filesystem

  secondarynamenode    run the DFS secondary namenode

  namenode             run the DFS namenode

  datanode             run a DFS datanode

  dfsadmin             run a DFS admin client

  mradmin              run a Map-Reduce admin client

  fsck                 run a DFS filesystem checking utility

  fs                   run a generic filesystem user client

  balancer             run a cluster balancing utility

  fetchdt              fetch a delegation token from the NameNode

  jobtracker           run the MapReduce job Tracker node

  pipes                run a Pipes job

  tasktracker          run a MapReduce task Tracker node

  historyserver        run job history servers as a standalone daemon

  job                  manipulate MapReduce jobs

  queue                get information regarding JobQueues

  version              print the version

  jar <jar>            run a jar file

  distcp <srcurl> <desturl> copy file or directories recursively

  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive

  classpath            prints the class path needed to get the

                       Hadoop jar and the required libraries

  daemonlog            get/set the log level for each daemon

 or

  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

[root@localhost hadoop-1.0.0]#

[root@localhost hadoop-1.0.0]#

then

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

$ ssh localhost

 

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

 

then modify the configuration

[root@localhost conf]# vi core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

 

[root@localhost conf]# vi hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

 

[root@localhost conf]# vi mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

 

 

create namenode

 [root@localhost hadoop-1.0.0]# bin/hadoop namenode -format

12/04/08 12:54:48 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = localhost.localdomain/127.0.0.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 1.0.0

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1214675; compiled by 'hortonfo' on Thu Dec 15 16:36:35 UTC 2011

************************************************************/

12/04/08 12:54:49 INFO util.GSet: VM type       = 32-bit

12/04/08 12:54:49 INFO util.GSet: 2% max memory = 19.33375 MB

12/04/08 12:54:49 INFO util.GSet: capacity      = 2^22 = 4194304 entries

12/04/08 12:54:49 INFO util.GSet: recommended=4194304, actual=4194304

12/04/08 12:54:52 INFO namenode.FSNamesystem: fsOwner=root

12/04/08 12:54:53 INFO namenode.FSNamesystem: supergroup=supergroup

12/04/08 12:54:53 INFO namenode.FSNamesystem: isPermissionEnabled=true

12/04/08 12:54:53 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

12/04/08 12:54:53 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

12/04/08 12:54:53 INFO namenode.NameNode: Caching file names occuring more than 10 times

12/04/08 12:54:54 INFO common.Storage: Image file of size 110 saved in 0 seconds.

12/04/08 12:54:54 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.

12/04/08 12:54:54 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1

************************************************************/

[root@localhost hadoop-1.0.0]#

go to explorer to check the log

http://localhost:50070/

 

hadoop安装---linux5(RHEL5)_第1张图片

http://localhost:50030/

 hadoop安装---linux5(RHEL5)_第2张图片

Copy the input files into the distributed filesystem:

[root@localhost hadoop-1.0.0]# bin/hadoop fs -put conf input

 

Run some of the examples provided:

[root@localhost hadoop-1.0.0]# bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

12/04/08 13:53:16 INFO mapred.FileInputFormat: Total input paths to process : 16

12/04/08 13:53:17 INFO mapred.JobClient: Running job: job_201204081256_0001

12/04/08 13:53:18 INFO mapred.JobClient:  map 0% reduce 0%

12/04/08 13:53:58 INFO mapred.JobClient:  map 6% reduce 0%

12/04/08 13:54:01 INFO mapred.JobClient:  map 12% reduce 0%

12/04/08 13:54:17 INFO mapred.JobClient:  map 25% reduce 0%

12/04/08 13:54:36 INFO mapred.JobClient:  map 31% reduce 0%

12/04/08 13:54:41 INFO mapred.JobClient:  map 37% reduce 8%

12/04/08 13:54:45 INFO mapred.JobClient:  map 43% reduce 8%

12/04/08 13:54:49 INFO mapred.JobClient:  map 50% reduce 12%

12/04/08 13:54:53 INFO mapred.JobClient:  map 56% reduce 12%

12/04/08 13:54:56 INFO mapred.JobClient:  map 62% reduce 12%

12/04/08 13:54:59 INFO mapred.JobClient:  map 68% reduce 16%

12/04/08 13:55:03 INFO mapred.JobClient:  map 75% reduce 16%

12/04/08 13:55:06 INFO mapred.JobClient:  map 81% reduce 20%

12/04/08 13:55:09 INFO mapred.JobClient:  map 87% reduce 20%

12/04/08 13:55:13 INFO mapred.JobClient:  map 93% reduce 27%

12/04/08 13:55:16 INFO mapred.JobClient:  map 100% reduce 27%

12/04/08 13:55:22 INFO mapred.JobClient:  map 100% reduce 31%

12/04/08 13:55:28 INFO mapred.JobClient:  map 100% reduce 100%

12/04/08 13:55:35 INFO mapred.JobClient: Job complete: job_201204081256_0001

12/04/08 13:55:35 INFO mapred.JobClient: Counters: 30

12/04/08 13:55:35 INFO mapred.JobClient:   Job Counters

12/04/08 13:55:35 INFO mapred.JobClient:     Launched reduce tasks=1

12/04/08 13:55:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=175469

12/04/08 13:55:35 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/04/08 13:55:35 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/04/08 13:55:35 INFO mapred.JobClient:     Launched map tasks=16

12/04/08 13:55:35 INFO mapred.JobClient:     Data-local map tasks=16

12/04/08 13:55:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=87301

12/04/08 13:55:35 INFO mapred.JobClient:   File Input Format Counters

12/04/08 13:55:35 INFO mapred.JobClient:     Bytes Read=26846

12/04/08 13:55:35 INFO mapred.JobClient:   File Output Format Counters

12/04/08 13:55:35 INFO mapred.JobClient:     Bytes Written=180

12/04/08 13:55:35 INFO mapred.JobClient:   FileSystemCounters

12/04/08 13:55:35 INFO mapred.JobClient:     FILE_BYTES_READ=82

12/04/08 13:55:35 INFO mapred.JobClient:     HDFS_BYTES_READ=28568

12/04/08 13:55:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=367514

12/04/08 13:55:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=180

12/04/08 13:55:35 INFO mapred.JobClient:   Map-Reduce Framework

12/04/08 13:55:35 INFO mapred.JobClient:     Map output materialized bytes=172

12/04/08 13:55:35 INFO mapred.JobClient:     Map input records=760

12/04/08 13:55:35 INFO mapred.JobClient:     Reduce shuffle bytes=172

12/04/08 13:55:35 INFO mapred.JobClient:     Spilled Records=6

12/04/08 13:55:35 INFO mapred.JobClient:     Map output bytes=70

12/04/08 13:55:35 INFO mapred.JobClient:     Total committed heap usage (bytes)=3252289536

12/04/08 13:55:35 INFO mapred.JobClient:     CPU time spent (ms)=22930

12/04/08 13:55:35 INFO mapred.JobClient:     Map input bytes=26846

12/04/08 13:55:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1722

12/04/08 13:55:35 INFO mapred.JobClient:     Combine input records=3

12/04/08 13:55:35 INFO mapred.JobClient:     Reduce input records=3

12/04/08 13:55:35 INFO mapred.JobClient:     Reduce input groups=3

12/04/08 13:55:35 INFO mapred.JobClient:     Combine output records=3

12/04/08 13:55:35 INFO mapred.JobClient:     Physical memory (bytes) snapshot=2292494336

12/04/08 13:55:35 INFO mapred.JobClient:     Reduce output records=3

12/04/08 13:55:35 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=6338478080

12/04/08 13:55:35 INFO mapred.JobClient:     Map output records=3

12/04/08 13:55:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

12/04/08 13:55:37 INFO mapred.FileInputFormat: Total input paths to process : 1

12/04/08 13:55:38 INFO mapred.JobClient: Running job: job_201204081256_0002

12/04/08 13:55:39 INFO mapred.JobClient:  map 0% reduce 0%

12/04/08 13:55:55 INFO mapred.JobClient:  map 100% reduce 0%

12/04/08 13:56:11 INFO mapred.JobClient:  map 100% reduce 100%

12/04/08 13:56:16 INFO mapred.JobClient: Job complete: job_201204081256_0002

12/04/08 13:56:16 INFO mapred.JobClient: Counters: 30

12/04/08 13:56:16 INFO mapred.JobClient:   Job Counters

12/04/08 13:56:16 INFO mapred.JobClient:     Launched reduce tasks=1

12/04/08 13:56:16 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=13535

12/04/08 13:56:16 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/04/08 13:56:16 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/04/08 13:56:16 INFO mapred.JobClient:     Launched map tasks=1

12/04/08 13:56:16 INFO mapred.JobClient:     Data-local map tasks=1

12/04/08 13:56:16 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14047

12/04/08 13:56:16 INFO mapred.JobClient:   File Input Format Counters

12/04/08 13:56:16 INFO mapred.JobClient:     Bytes Read=180

12/04/08 13:56:16 INFO mapred.JobClient:   File Output Format Counters

12/04/08 13:56:16 INFO mapred.JobClient:     Bytes Written=52

12/04/08 13:56:16 INFO mapred.JobClient:   FileSystemCounters

12/04/08 13:56:16 INFO mapred.JobClient:     FILE_BYTES_READ=82

12/04/08 13:56:16 INFO mapred.JobClient:     HDFS_BYTES_READ=295

12/04/08 13:56:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=42409

12/04/08 13:56:16 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=52

12/04/08 13:56:16 INFO mapred.JobClient:   Map-Reduce Framework

12/04/08 13:56:16 INFO mapred.JobClient:     Map output materialized bytes=82

12/04/08 13:56:16 INFO mapred.JobClient:     Map input records=3

12/04/08 13:56:16 INFO mapred.JobClient:     Reduce shuffle bytes=82

12/04/08 13:56:16 INFO mapred.JobClient:     Spilled Records=6

12/04/08 13:56:16 INFO mapred.JobClient:     Map output bytes=70

12/04/08 13:56:16 INFO mapred.JobClient:     Total committed heap usage (bytes)=210763776

12/04/08 13:56:16 INFO mapred.JobClient:     CPU time spent (ms)=2270

12/04/08 13:56:16 INFO mapred.JobClient:     Map input bytes=94

12/04/08 13:56:16 INFO mapred.JobClient:     SPLIT_RAW_BYTES=115

12/04/08 13:56:16 INFO mapred.JobClient:     Combine input records=0

12/04/08 13:56:16 INFO mapred.JobClient:     Reduce input records=3

12/04/08 13:56:16 INFO mapred.JobClient:     Reduce input groups=1

12/04/08 13:56:16 INFO mapred.JobClient:     Combine output records=0

12/04/08 13:56:16 INFO mapred.JobClient:     Physical memory (bytes) snapshot=179437568

12/04/08 13:56:16 INFO mapred.JobClient:     Reduce output records=3

12/04/08 13:56:16 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=749031424

12/04/08 13:56:16 INFO mapred.JobClient:     Map output records=3

[root@localhost hadoop-1.0.0]#

Examine the output files:

 

Copy the output files from the distributed filesystem to the local filesytem and examine them:

 

[root@localhost hadoop-1.0.0]# bin/hadoop fs -get output output

[root@localhost hadoop-1.0.0]# cat output/*

cat: output/_logs: Is a directory

1       dfs.replication

1       dfs.server.namenode.

1       dfsadmin

[root@localhost hadoop-1.0.0]#

OK

The pseudo system is installed succefully

 

 

                

                  

你可能感兴趣的:(hadoop安装---linux5(RHEL5))