官方文档地址http://hadoop.apache.org/common/docs/r1.0.3/
http://www.tbdata.org/
下载到:jdk-6u26-linux-x64.bin and hadoop-1.0.3.tar.gz
它有三种模式:
Local (Standalone) Mode #本地节点
Pseudo-Distributed Mode #伪分布式
Fully-Distributed Mode #全分布式
首先用单节点做一种伪分布式的架构
- chmod +x jdk-6u26-linux-x64.bin
- ./jdk-6u26-linux-x64.bin
- mv jdk1.6.0_26/ /usr/local/jdk
- vim .bash_profile
- PATH=$PATH:$HOME/bin:/usr/local/jdk/bin
- source .bash_profile
- useradd yejk
- passwd yejk
- cd /home/yejk
- vim .bash_profile
- PATH=$PATH:$HOME/bin:/usr/local/jdk/bin
- source .bash_profile
- cp hadoop-1.0.3.tar.gz /home/yejk/
- su - yejk
- tar zxf hadoop-1.0.3.tar.gz
- cd hadoop-1.0.3
- 修改一些配置文件
- vim conf/hadoop-env.sh
- # The java implementation to use. Required.
- export JAVA_HOME=/usr/local/jdk
- vim conf/core-site.xml:
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- </configuration>
- vim conf/hdfs-site.xml:
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
- vim conf/mapred-site.xml:
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- </configuration>
建立ssh无密码访问
- ssh-keygen #一路回车
- ssh-copy-id -i ~/.ssh/id_rsa.pub localhost
格式化一个新的dfs文件系统:
- bin/hadoop namenode -format
- ***************
- 2/06/03 07:04:49 INFO common.Storage: Storage directory /tmp/hadoop-yejk/dfs/name has been successfully formatted.
- *****************
启动hadoop:
- bin/start-all.sh
NameNode : http://localhost:50070/
JobTracker :http://localhost:50030/
在文件系统中新建一个目录
- bin/hadoop fs -mkdir test
将conf文件中的数据复制上传到刚刚建立的文件夹中:
- bin/hadoop fs -put conf test
- [yejk@server95 hadoop-1.0.3]$ bin/hadoop fs -du
- Found 1 items
- 54816 hdfs://localhost:9000/user/yejk/test
- [yejk@server95 hadoop-1.0.3]$ bin/hadoop fs -ls
- Found 1 items
- drwxr-xr-x - yejk supergroup 0 2012-06-03 07:19 /user/yejk/test
用自带的一个程序进行测试
- bin/hadoop jar hadoop-examples-1.0.3.jar grep test/* output 'dfs[a-z.]+'
意为使用这个java程序从上传到dfs里的test文件夹里的所有数据中搜索以dfs开头的关键字并统计排序,并把结果保存在output中
产看结果:
- [yejk@server95 hadoop-1.0.3]$ bin/hadoop fs -cat output/*
- 2 dfs.replication
- 2 dfs.server.namenode.
- 2 dfsadmin
- cat: File does not exist: /user/yejk/output/_logs
或者可以:
- bin/hadoop fs -get output output
- [yejk@server95 output]$ cat part-00000
- 2 dfs.replication
- 2 dfs.server.namenode.
- 2 dfsadmin