centos7安装hadoop3.0.3和jdk1.8的伪分布式模式

centos7安装hadoop3.0.3和jdk1.8的伪分布式模式

添加普通用户hadoop

useradd hadoop
passwd hadoop
1

给hadoop用户sudo权限

chmod u+w /etc/sudoers
vi /etc/sudoers
添加
hadoop ALL=(ALL) ALL
或者
hadoop ALL=(root) NOPASSWD:ALL

切换到hadoop用户

su - hadoop

安装到/home/hadoop/hadoop3.03目录

sudo mkidr /home/hadoop/hadoop3.03
tar -zxvf hadoop-3.0.3.tar.gz
mv hadoop-3.0.3 hadoop3.03

安装到/home/hadoop/java/jdk1.8

tar -zxvf jdk-8u172-linux-x64.gz
mv jdk_1.8.0.172 jdk1.8

配置环境变量

vi /etc/profile

##java
export JAVA_HOME=/home/hadoop/java/jdk1.8
export PATH=$PATH:$JAVA_HOME/bin

##hadoop
export HADOOP_HOME=/home/hadoop/hadoop3.03
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

验证

echo $JAVA_HOME
echo $HADOOP_HOME

配置 hadoop-env.sh、mapred-env.sh、yarn-env.sh文件的JAVA_HOME参数

export JAVA_HOME=/home/hadoop/java/jdk1.8

配置core-site.xml

hadoop-localhost为主机名称,/opt/data/tmp要先创建好目录







<configuration>
  <property>
      <name>fs.defaultFSname>
      <value>hdfs://hadoop-localhost:8020value>
      <description>HDFS的URI,文件系统://namenode标识:端口号description> 
  property>
  <property>
      <name>hadoop.tmp.dirname>
      <value>/opt/data/tmpvalue>
      <description>namenode上本地的hadoop临时文件夹description> 
  property>
configuration>

hadoop.tmp.dir配置的是Hadoop临时目录,比如HDFS的NameNode数据默认都存放这个目录下,查看*-default.xml等默认配置文件,就可以看到很多依赖${hadoop.tmp.dir}的配置。

默认的hadoop.tmp.dir是/tmp/hadoop-${user.name},此时有个问题就是NameNode会将HDFS的元数据存储在这个/tmp目录下,如果操作系统重启了,系统会清空/tmp目录下的东西,导致NameNode元数据丢失,是个非常严重的问题,所有我们应该修改这个路径。

sudo mkdir -p /opt/data/tmp

将临时目录的所有者修改为hadoop
sudo chown –R hadoop:hadoop /opt/data/tmp

配置hdfs-site.xml





<configuration> 
<property>
      <name>dfs.name.dirname>
      <value>/opt/data/tmp/dfs/namevalue>
      <description>namenode上存储hdfs名字空间元数据description>
  property>
  <property>
      <name>dfs.data.dirname>
      <value>/opt/data/tmp/dfs/datavalue>
      <description>datanode上数据块的物理存储位置description>
  property> 
  
  <property>
      <name>dfs.replicationname>
      <value>1value>
  property>
configuration>

格式化HDFS

sudo chown -R hadoop:hadoop /opt/data
hdfs namenode –format

查看NameNode格式化后的目录
$ ll /opt/data/tmp/dfs/name/current

启动NameNode
sbin/hadoop-daemon.sh start namenode

启动DataNode
sbin/hadoop-daemon.sh start datanode

启动SecondaryNameNode
sbin/hadoop-daemon.sh start secondarynamenode

JPS命令查看是否已经启动成功,有结果就是启动成功了
$ jps

HDFS上测试创建目录、上传、下载文件

[hadoop@hadoop-localhost hadoop3.03]#
创建目录
bin/hdfs dfs -mkdir /demo1

上传
bin/hdfs dfs -put etc/hadoop/core-site.xml /demo1

读取HDFS上的文件内容
bin/hdfs dfs -cat /demo1/core-site.xml

从HDFS上下载文件到本地
bin/hdfs dfs -get /demo1/core-site.xml

查看hdfs的web页面

hdfs 2.X版本的web页面端口号为50070
http://192.168.145.129:50070

hdfs 3.X版本的web页面端口号为9870
http://192.168.145.129:9870/dfshealth.html#tab-overview

配置、启动YARN

配置mapred-site.xml







<configuration>
   <property>  
      <name>mapreduce.framework.namename>  
      <value>yarnvalue>  
  property>  
   <property>
    <name>yarn.app.mapreduce.am.envname>
    <value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop3.03value>
    property>
    <property>
      <name>mapreduce.map.envname>
      <value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop3.03value>
    property>
    <property>
      <name>mapreduce.reduce.envname>
      <value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop3.03value>
    property>
configuration>

配置yarn-site.xml

arn.nodemanager.aux-services配置了yarn的默认混洗方式,选择为mapreduce的默认混洗算法。

yarn.resourcemanager.hostname指定了Resourcemanager运行在哪个节点上。




<configuration>

<property>  
    <name>yarn.nodemanager.aux-servicesname>  
    <value>mapreduce_shufflevalue>  
property>   

<property>
  <name>yarn.resourcemanager.hostnamename>
  <value>hadoop-localhostvalue>
property>
configuration>

启动Resourcemanager

sbin/yarn-daemon.sh start resourcemanager

启动nodemanager

sbin/yarn-daemon.sh start nodemanager

也可执行批处理文件启动服务
启动hdfs 和yarn
sbin/start-dfs.sh
sbin/start-yarn.sh

sbin/start-all.sh

YARN的Web页面

YARN的Web客户端端口号是8088,通过http://192.168.145.129:8088/可以查看。

运行MapReduce Job

创建测试用的Input文件
bin/hdfs dfs -mkdir -p /wordcountdemo/input

wc.input文件内容为:

hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop

将wc.input文件上传到HDFS的/wordcountdemo/input目录中:
bin/hdfs dfs -put /opt/data/wc.input /wordcountdemo/input

运行WordCount MapReduce Job

[hadoop@hadoop-localhost hadoop3.03]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar wordcount /wordcountdemo/input /wordcountdemo/output

2018-07-03 19:38:23,956 INFO client.RMProxy: Connecting to ResourceManager at hadoop-localhost/192.168.145.129:8032
2018-07-03 19:38:24,565 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1530615244194_0002
2018-07-03 19:38:24,879 INFO input.FileInputFormat: Total input files to process : 1
2018-07-03 19:38:25,784 INFO mapreduce.JobSubmitter: number of splits:1
2018-07-03 19:38:25,841 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-07-03 19:38:26,314 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1530615244194_0002
2018-07-03 19:38:26,315 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-07-03 19:38:26,466 INFO conf.Configuration: resource-types.xml not found
2018-07-03 19:38:26,466 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-07-03 19:38:26,547 INFO impl.YarnClientImpl: Submitted application application_1530615244194_0002
2018-07-03 19:38:26,590 INFO mapreduce.Job: The url to track the job: http://hadoop-localhost:8088/proxy/application_1530615244194_0002/
2018-07-03 19:38:26,590 INFO mapreduce.Job: Running job: job_1530615244194_0002
2018-07-03 19:38:35,985 INFO mapreduce.Job: Job job_1530615244194_0002 running in uber mode : false
2018-07-03 19:38:35,988 INFO mapreduce.Job:  map 0% reduce 0%
2018-07-03 19:38:42,310 INFO mapreduce.Job:  map 100% reduce 0%
2018-07-03 19:38:47,402 INFO mapreduce.Job:  map 100% reduce 100%
2018-07-03 19:38:49,469 INFO mapreduce.Job: Job job_1530615244194_0002 completed successfully
2018-07-03 19:38:49,579 INFO mapreduce.Job: Counters: 53
    File System Counters
        FILE: Number of bytes read=94
        FILE: Number of bytes written=403931
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=195
        HDFS: Number of bytes written=60
        HDFS: Number of read operations=8
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=4573
        Total time spent by all reduces in occupied slots (ms)=2981
        Total time spent by all map tasks (ms)=4573
        Total time spent by all reduce tasks (ms)=2981
        Total vcore-milliseconds taken by all map tasks=4573
        Total vcore-milliseconds taken by all reduce tasks=2981
        Total megabyte-milliseconds taken by all map tasks=4682752
        Total megabyte-milliseconds taken by all reduce tasks=3052544
    Map-Reduce Framework
        Map input records=4
        Map output records=11
        Map output bytes=115
        Map output materialized bytes=94
        Input split bytes=122
        Combine input records=11
        Combine output records=7
        Reduce input groups=7
        Reduce shuffle bytes=94
        Reduce input records=7
        Reduce output records=7
        Spilled Records=14
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=171
        CPU time spent (ms)=1630
        Physical memory (bytes) snapshot=332750848
        Virtual memory (bytes) snapshot=5473169408
        Total committed heap usage (bytes)=165810176
        Peak Map Physical memory (bytes)=214093824
        Peak Map Virtual memory (bytes)=2733207552
        Peak Reduce Physical memory (bytes)=118657024
        Peak Reduce Virtual memory (bytes)=2739961856
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=73
    File Output Format Counters 
        Bytes Written=60
[hadoop@hadoop-localhost hadoop3.03]$

输出统计结果为:

[hadoop@hadoop-localhost hadoop3.03]$ bin/hdfs dfs -cat /wordcountdemo/output/part-r-00000
hadoop  3
hbase   1
hive    2
mapreduce   1
spark   2
sqoop   1
storm   1
[hadoop@hadoop-localhost hadoop3.03]$ 

结果是按照键值排好序的

停止Hadoop

sbin/hadoop-daemon.sh stop namenode
sbin/hadoop-daemon.sh stop datanode
sbin/yarn-daemon.sh stop resourcemanager
sbin/yarn-daemon.sh stop nodemanager

全部停止批处理文件
sbin/stop_yarn.sh
sbin/stop_dfs.sh

sbin/stop_all.sh

HDFS模块简介

HDFS负责大数据的存储,通过将大文件分块后进行分布式存储方式,突破了服务器硬盘大小的限制,解决了单台机器无法存储大文件的问题,HDFS是个相对独立的模块,可以为YARN提供服务,也可以为HBase等其他模块提供服务。

YARN模块简介

YARN是一个通用的资源协同和任务调度框架,是为了解决Hadoop1.x中MapReduce里NameNode负载太大和其他问题而创建的一个框架。

YARN是个通用框架,不止可以运行MapReduce,还可以运行Spark、Storm等其他计算框架。

MapReduce模块简介

MapReduce是一个计算框架,它给出了一种数据处理的方式,即通过Map阶段、Reduce阶段来分布式地流式处理数据。它只适用于大数据的离线处理,对实时性要求很高的应用不适用。

—-the—–end—-

你可能感兴趣的:(操作系统,Linux/Unix,hadoop)