JDK:jdk-6u18-linux-i586.bin
Hadoop:hadoop-0.21.0
说明:hadoop版本一定要与本文档匹配,新版本目录格式已做修改。
步骤一:配置环境
本步骤参见:Hadoop环境搭建——单节点篇
1、安装jdk1.6.0_18
1)在usr下面新建一个文件夹Java,然后将jdk复制过来.
sudo mkdir /usr/Java
sudo cp jdk的路径 /usr/Java
2)进入到Java目录下,改变文件权限为可执行
cd /usr/Java
sudo chmod u+x jdk-6u18-linux-i586.bin
3)执行安装
(现象为Unpacking....加一连串解压信息)
sudo ./jdk-6u18-linux-i586.bin
2、安装hadoop0.21.0
1)将hadoop0.21.0.tar.gz复制到usr下面的local文件夹内
sudo cp hadoop的路径 /usr/local
2)进入到local目录下,解压hadoop-0.21.0.tar.gz
cd /usr/local
sudo tar -xzf hadoop-0.21.0.tar.gz
3)为了方便管理,将解压后的文件夹名改为hadoop
sudo mv hadoop-0.21.0 hadoop
3、创建一个名为hadoop的用户和用户组
1)创建一个名为hadoop的用户组
sudo addgroup hadoop
2)创建一个名为hadoop的用户,归到hadoop用户组下
sudo adduser --ingroup hadoop hadoop
3)用gedit打开etc下的sudoers文件
sudo gedit /etc/sudoers
4)在 root ALL=(ALL) ALL 下面添加如下一行,然后保存关闭gedit
hadoop ALL=(ALL) ALL
4、配置环境变量
1)用gedit打开etc下的profile文件
sudo gedit /etc/profile
2)在文件最后加入如下几行
export CLASSPATH=.:/usr/Java/jdk1.6.0_23/lib:/usr/Java/jdk1.6.0_18/jre/lib:$CLASSPATH
export PATH=.:/usr/Java/jdk1.6.0_23/bin:/usr/Java/jdk1.6.0_18/jre/bin:/usr/local/hadoop/bin:$PATH
3)保存后关闭gedit,并重启机器
sudo reboot
4)重启后用hadoop用户登录,验证配置是否成功
java -version(验证java配置是否成功)
5、创建ssh-key
1)确保网络通畅,然后装载ssh服务
sudo apt-get install openssh-server
2)创建ssh-key,为rsa
ssh-keygen -t rsa --P
3)将此ssh-key添加到信任列表中,并启用此ssh-key
cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
sudo /etc/init.d/ssh reload
4)重启系统
6、配置hadoop
1)进入到hadoop目录下,配置conf目录下的hadoop-env.sh中的JAVA_HOME
cd /usr/local/hadoop
sudo gedit conf/hadoop-env.sh
(打开后在文档的上部某行有“#export JAVA_HOME=...”字样的地方,去掉“#”,然后在等号后面填写你的jdk路径,完全按此文档来的话应改为 "export JAVA_HOME=/usr/Java/jdk1.6.0_23" )
2)配置conf目录下的core-site.xml
sudo gedit conf/core-site.xml
配置文件内容core-site.xml。
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> </configuration>3)配置conf目录下的mapred-site.xml
配置文件内容mapred-site.xml。
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
2776 SecondaryNameNode 10848 Jps 2322 NameNode 2886 JobTracker 2539 DataNode 3102 TaskTracker
本步骤参考文档:Hadoop的C++扩展了解。
1、创建wordcount.cpp
cd /home/hadoop/tmp
mkdir wordcount
cd wordcount
sudo gedit wordcount.cpp
#include<algorithm> #include<limits> #include<string> #include"stdint.h" #include"hadoop/Pipes.hh" #include"hadoop/TemplateFactory.hh" #include"hadoop/StringUtils.hh" using namespace std; class WordCountMapper:public HadoopPipes::Mapper { public: WordCountMapper(HadoopPipes::TaskContext& context){} void map(HadoopPipes::MapContext& context) { string line =context.getInputValue(); vector<string>word = HadoopUtils::splitString(line, " "); for (unsigned int i=0; i<word.size(); i++) { context.emit(word[i],HadoopUtils::toString(1)); } } }; class WordCountReducer:public HadoopPipes::Reducer { public: WordCountReducer(HadoopPipes::TaskContext& context){} void reduce(HadoopPipes::ReduceContext& context) { int count = 0; while (context.nextValue()) { count +=HadoopUtils::toInt(context.getInputValue()); } context.emit(context.getInputKey(),HadoopUtils::toString(count)); } }; int main(int argc, char **argv) { return HadoopPipes::runTask(HadoopPipes::TemplateFactory<WordCountMapper,WordCountReducer>()); }
CC = g++ HADOOP_INSTALL = /usr/local/hadoop PLATFORM = Linux-i386-32 CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include LIBS = -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes -lhadooputils -lpthread wordcount: wordcount.cpp $(CC) $(CPPFLAGS) $< -Wall $(LIBS) -g -O2 -o $@
<?xml version="1.0"?> <configuration> <property> <name>mapred.job.name</name> <value>WordCount</value> </property> <property> <name>mapred.reduce.tasks</name> <value>10</value> </property> <property> <name>mapred.task.timeout</name> <value>180000</value> </property> <property> <name>hadoop.pipes.executable</name> <value>/user/hadoop/bin/wordcount</value> <description> Executable path is given as"path#executable-name" sothat the executable will havea symlink in working directory. This can be used for gdbdebugging etc. </description> </property> <property> <name>mapred.create.symlink</name> <value>yes</value> </property> <property> <name>hadoop.pipes.java.recordreader</name> <value>true</value> </property> <property> <name>hadoop.pipes.java.recordwriter</name> <value>true</value> </property> </configuration>
1 1 12 1 13 1 2 1 3 1 4 1 5 1