在Linux中存放软件压缩包和安装软件的目录说下
/opt/software/ --存放软件压缩包
/opt/modules/ --存放安装软件包
/opt/tools/ --存放工具包
/opt/data/ --存放测试数据
以上是我规定的目录,也可以自行设置软件的安装目录和存放软件压缩包
1、解压hadoop-1.2.1
tar -zxvf /opt/software/hadoop-1.2.1-bin.tar.gz
sudo cp hadoop-1.2.1 /opt/modules/
一、首先配置单机模式
1、配置jdk到hadoop-env.sh
vim hadoop-1.2.1/conf/hadoop-env.sh
export JAVA_HOME=/opt/modules/jdk1.7.0_79
自己安装的jdk路径
source /opt/modules/hadoop-1.2.1/conf/hadoop-env.sh
3、在profile文件中增加hadoop的安装路径
vim /etc/profile
在配置文件中末尾追加以下内容
使配置生效
source /etc/profile
hadoop
Warning: $HADOOP_HOME is deprecated.
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
mradmin run a Map-Reduce admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer run a cluster balancing utility
oiv apply the offline fsimage viewer to an fsimage
fetchdt fetch a delegation token from the NameNode
jobtracker run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker run a MapReduce task Tracker node
historyserver run job history servers as a standalone daemon
job manipulate MapReduce jobs
queue get information regarding JobQueues
version print the version
jar <jar> run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
distcp2 <srcurl> <desturl> DistCp version 2
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
5、测试MapReduce是否正常
进入data目录
cd /opt/data/
6、创建input文件夹
sudo mkdir input7、拷贝文件到这个目录
sudo cp /opt/hadoop-1.2.1/conf/*.xml input
8、执行hadoop中自带的例子
进入hadoop目录
cd /opt/modules/hadoop-1.2.1/
bin/hadoop jar hadoop-examples-1.2.1.jar grep /opt/data/input/ /opt/data/output 'dfs[a-z.]+'
10、如果在output目录下出现比如
part-00000 _SUCCESS俩个文件,说明执行成功过
11、进入以下文件
more part-00000
有以上内容,说明已经完全执行成功
二、接下来配置伪分布式
1、首先配置核心文件
vim conf/core-site.xml
在以上的“configuration”标签下加入以下图示内容
代码如下
<property><!--配置HDFS的地址和端口号--> <name>fs.default.name</name> <value>hdfs://master.dragon.org:9000</value> </property> <property><!--配置HDFS数据的存放目录--> <name>hadoop.tmp.dir</name> <value>/opt/data/tmp</value> </property>
vim conf/hdfs-site.xml
在以上的“configuration”标签下加入以下图示内容
代码如下
<property><!--配置HDFS的备份方式默认为3,在单机版的为1--> <name>dfs.replication</name> <value>1</value> </property> <property><!--配置HDFS的验证,这里不让他验证--> <name>dfs.permissions</name> <value>false</value> </property>
vim conf/mapred-site.xml
在以上的“configuration”标签下加入以下图示内容
<property><!--配置MapReduce的访问地址--> <name>mapred.job.tracker</name> <value>master.dragon.org:9001</value> </property>
vim conf/masters
5、配置conf文件夹下的slaves文件
vim conf/slaves
6、格式化namenode
bin/hadoop namenode -format
bin/start-all.sh
9、然后再浏览器中输入master.dragon.org:50070,如果访问成功,出现如下界面
10、如果想要在windows下面访问hadoop,就要在hosts文件中增加Linux主机IP和主机名
11、测试
上传文件到hdfs文件系统中
首先在文件系统中创建存放文件的目录
hadoop fs -mkdir hdfs://master.dragon.org:9000/wc/input/
查看是否创建成功
hadoop fs -lsr hdfs://master.dragon.org:9000/
drwxr-xr-x - root supergroup 0 2016-01-05 17:18 /wc
drwxr-xr-x - root supergroup 0 2016-01-05 17:18 /wc/input
接着开始上传
hadoop fs -put conf/*.xml hdfs://master.dragon.org:9000/wc/input/
drwxr-xr-x - root supergroup 0 2016-01-05 17:18 /wc
drwxr-xr-x - root supergroup 0 2016-01-05 17:23 /wc/input
-rw-r--r-- 1 root supergroup 7457 2016-01-05 17:23 /wc/input/capacity-scheduler.xml
-rw-r--r-- 1 root supergroup 444 2016-01-05 17:23 /wc/input/core-site.xml
-rw-r--r-- 1 root supergroup 327 2016-01-05 17:23 /wc/input/fair-scheduler.xml
-rw-r--r-- 1 root supergroup 4644 2016-01-05 17:23 /wc/input/hadoop-policy.xml
-rw-r--r-- 1 root supergroup 331 2016-01-05 17:23 /wc/input/hdfs-site.xml
-rw-r--r-- 1 root supergroup 2033 2016-01-05 17:23 /wc/input/mapred-queue-acls.xml
-rw-r--r-- 1 root supergroup 276 2016-01-05 17:23 /wc/input/mapred-site.xml
下载用get命令,模仿以上
12、运行一个例子
hadoop jar hadoop-examples-1.2.1.jar wordcount hdfs://master.dragon.org:9000/wc/input/ hdfs://master.dragon.org:9000/wc/output/
16/01/05 17:32:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/01/05 17:32:11 INFO input.FileInputFormat: Total input paths to process : 7
16/01/05 17:32:11 WARN snappy.LoadSnappy: Snappy native library not loaded
16/01/05 17:32:12 INFO mapred.JobClient: Running job: job_201601051530_0001
16/01/05 17:32:13 INFO mapred.JobClient: map 0% reduce 0%
16/01/05 17:33:31 INFO mapred.JobClient: map 28% reduce 0%
16/01/05 17:33:54 INFO mapred.JobClient: map 57% reduce 0%
16/01/05 17:34:05 INFO mapred.JobClient: map 57% reduce 19%
16/01/05 17:34:23 INFO mapred.JobClient: map 85% reduce 19%
16/01/05 17:34:33 INFO mapred.JobClient: map 100% reduce 19%
16/01/05 17:34:35 INFO mapred.JobClient: map 100% reduce 28%
16/01/05 17:34:37 INFO mapred.JobClient: map 100% reduce 100%
16/01/05 17:34:39 INFO mapred.JobClient: Job complete: job_201601051530_0001
16/01/05 17:34:39 INFO mapred.JobClient: Counters: 29
16/01/05 17:34:39 INFO mapred.JobClient: Job Counters
16/01/05 17:34:39 INFO mapred.JobClient: Launched reduce tasks=1
16/01/05 17:34:39 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=229608
16/01/05 17:34:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/01/05 17:34:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/01/05 17:34:39 INFO mapred.JobClient: Launched map tasks=7
16/01/05 17:34:39 INFO mapred.JobClient: Data-local map tasks=7
16/01/05 17:34:39 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=65560
16/01/05 17:34:39 INFO mapred.JobClient: File Output Format Counters
16/01/05 17:34:39 INFO mapred.JobClient: Bytes Written=6564
16/01/05 17:34:39 INFO mapred.JobClient: FileSystemCounters
16/01/05 17:34:39 INFO mapred.JobClient: FILE_BYTES_READ=15681
16/01/05 17:34:39 INFO mapred.JobClient: HDFS_BYTES_READ=15512
16/01/05 17:34:39 INFO mapred.JobClient: FILE_BYTES_WRITTEN=451968
16/01/05 17:34:39 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=6564
16/01/05 17:34:39 INFO mapred.JobClient: File Input Format Counters
16/01/05 17:34:39 INFO mapred.JobClient: Bytes Read=15512
16/01/05 17:34:39 INFO mapred.JobClient: Map-Reduce Framework
16/01/05 17:34:39 INFO mapred.JobClient: Map output materialized bytes=10651
16/01/05 17:34:39 INFO mapred.JobClient: Map input records=386
16/01/05 17:34:39 INFO mapred.JobClient: Reduce shuffle bytes=10651
16/01/05 17:34:39 INFO mapred.JobClient: Spilled Records=1196
16/01/05 17:34:39 INFO mapred.JobClient: Map output bytes=21309
16/01/05 17:34:39 INFO mapred.JobClient: Total committed heap usage (bytes)=867921920
16/01/05 17:34:39 INFO mapred.JobClient: CPU time spent (ms)=16610
16/01/05 17:34:39 INFO mapred.JobClient: Combine input records=1761
16/01/05 17:34:39 INFO mapred.JobClient: SPLIT_RAW_BYTES=847
16/01/05 17:34:39 INFO mapred.JobClient: Reduce input records=598
16/01/05 17:34:39 INFO mapred.JobClient: Reduce input groups=427
16/01/05 17:34:39 INFO mapred.JobClient: Combine output records=598
16/01/05 17:34:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=1215692800
16/01/05 17:34:39 INFO mapred.JobClient: Reduce output records=427
16/01/05 17:34:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5859864576
16/01/05 17:34:39 INFO mapred.JobClient: Map output records=1761
运行完之后
查看part-r-00000文件
hadoop fs -cat hdfs://master.dragon.org:9000/wc/output/part-r-00000
OK!!!