这篇文档描述了怎样安装和配置一个单节点的Hadoop,并且利用Hadoop MapReduce和HDFS进行简单的操作。
Linux和Windows都必须的软件:
Windows:
解压下载的hadoop分发包,修改/conf/hadoop-env.sh来指定JAVA_HOME
然后运行bin/hadoop,显示hadoop操作指令
现在你已经准备好了开始你的Hadoop集群。hadoop集群支持以下3中模式:
这也是Hadoop的默认运行模式,此时作为一个单独的Java 进程,此模式对Debugging非常有用。
下面的例子演示了单机模式
$ mkdir input
$ cp conf/.xml input
$ bin/hadoop jar hadoop-examples-.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Hadoop 也可以在一个单独的节点上以伪分布式模式运行,此时,每一个Hadoop程序作为一个独立的Java进程运行。
conf/core-site.xml:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
conf/hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
conf/mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
检查你是否可以使用ssh不需要passphrase登录到localhost
ssh localhost
eg:ssh localhost
The authenticity of host '[localhost]:11201 ([::1]:11201)' can't be established.
RSA key fingerprint is 01:05:83:c6:d3:a7:7a:92:c6:c0:0c:3e:55:60:85:b1.
Are you sure you want to continue connecting (yes/no)?
如上如果不能登录,执行下面代码配置本地ssh
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub » ~/.ssh/authorized_keys
此时,运行ssh localhost
xxx@xxx :~/programs/hadoop-1.2.1$ ssh localhost
Linux xxx 2.6.32-5-amd64 #1 SMP Fri Feb 15 15:39:52 UTC 2013 x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
You have new mail.
Last login: Mon Apr 21 14:10:46 2014 from localhost
格式化一个新的分布式文件系统
`bin/hadoop namenode -format
启动hadoop程序
$ bin/start-all.sh
此时hadoop伪分布式模式完成。
hadoop后台程序日志输出到${HADOOP_LOG_DIR},默认为${HADOOP_HOME}/logs)
Haddop同时提供了Web接口,默认可以访问:
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
When you're done, stop the daemons with:
$ bin/stop-all.sh