hadoop在linux系统下的单节点伪分布式配置
实验环境: rhel6.3 && iptables and selinux disabled jdk: jdk-6u26-linux-x64.bin
hadoop版本: hadoop-1.2.1.tar.gz
下载 && 安装 JDK
http://www.oracle.com/technetwork/java/javaee/downloads/java-ee-sdk-6u3-jdk-6u29-downloads-523388.html
#sh jdk-6u26-linux-x64.bin #mv jdk1.6.0_32/ /usr/local/jdk
下载hadoop源码包
http://hadoop.apache.org/
解压至指定目录 && 精简目录名称
#tar zxf hadoop-1.2.1.tar.gz -C /usr/local #mv hadoop1.2.1/ hadoop
配置JAVA环境变量
#mv hadoop1.2.1/ hadoop #cd /usr/local/hadoop/ #vim conf/hadoop-env.sh export JAVA_HOME=/usr/local/hadoop/jdk
编辑配置文件
http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
#vim core-site.xml <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration> #vim hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> #vim mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
check that you can ssh to the localhost without a passphrase
#ssh-keygen #ssh-copy-id localhost #ssh localhost
格式化文件系统 && 启动所有服务
#cd /usr/local/hadoop/bin/ #./hadoop namenode -format #./start-all.sh
查看所有服务进程以及PID
#/usr/local/hadoop/jdk/bin/jps 5147 Jps 2460 TaskTracker 2176 DataNode 2276 SecondaryNameNode 2077 NameNode 2350 JobTracker
检测
上传/usr/local/hadoop/conf/ 至 input/
#cd /usr/local/hadoop #bin/hadoop fs -put conf input #bin/hadoop fs -ls drwxr-xr-x - root supergroup 0 2014-03-08 03:22 /user/root/input
outpot/ 目录
#bin/hadoop jar hadoop-examples-1.1.2.jar grep input output 'dfs[a-z.]+' #bin/hadoop fs -ls #bin/hadoop fs -cat output/* #查看output目录 1 dfs.replication 1 dfs.server.namenode. 1 dfsadmin
Hadoop 重要的端口:
1.Job Tracker 管理界面:50030
2.HDFS 管理界面 :50070
3.HDFS通信端口:9000
4.MapReduce通信端口:9001
1. HDFS 界面
http://localhost:50070
2. MapReduce 管理界面
http://holocalhost:50030
HDFS:
NameNode :管理节点
DataNode :数据节点
SecondaryNamenode : 数据源信息备份整理节点
MapReduce:
JobTracker :任务管理节点
Tasktracker :任务运行节点