Ubuntu下安装Hadoop(单机)

Ubuntu下安装Hadoop(单机)

      • 确保已安装Java
      • 设置ssh无密码登录localhost
      • 安装Hadoop
      • 运行Hadoop(伪集群)
      • 执行MapReduce任务,使用hadoop预置的示例程序进行演示
      • 关闭hdfs
      • 运行YARN
      • 关闭YARN

确保已安装Java

Hadoop是用Java开发的,必须先安装Java环境,Oracle和OpenJDK都可以。具体版本可以参考官方wiki:https://wiki.apache.org/hadoop/HadoopJavaVersions

设置ssh无密码登录localhost

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

设置成功之后,使用 ssh localhost 可以正常登录

$ ssh localhost

退出ssh登录的shell:

$ exit

安装Hadoop

  1. 下载安装文件
    $ wget https://www-us.apache.org/dist/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz
    
  2. 移动到/usr/local目录下并解压
    $ sudo mv hadoop-2.9.1.tar.gz /usr/local/
    $ sudo tar -zxvf hadoop-2.9.1.tar.gz
    
  3. 在hadoop主目录下运行bin/hadoop将会看到使用说明:
    Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
      CLASSNAME            run the class named CLASSNAME
     or
      where COMMAND is one of:
      fs                   run a generic filesystem user client
      version              print the version
      jar <jar>            run a jar file
                           note: please use "yarn jar" to launch
                                 YARN applications, not this command.
      checknative [-a|-h]  check native hadoop and compression libraries availability
      distcp <srcurl> <desturl> copy file or directories recursively
      archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
      classpath            prints the class path needed to get the
                           Hadoop jar and the required libraries
      credential           interact with credential providers
      daemonlog            get/set the log level for each daemon
      trace                view and modify Hadoop tracing settings
    
    Most commands print help when invoked w/o parameters.
    
    
  4. 编辑~/.bashrc 文件将hadoop加入到路径变量中,这样在终端中直接可执行hadoop命令
    $ gedit ~/.bashrc
    
    在文件最后添加如下内容(JAVA_HOME根据实际情况配置):
    export JAVA_HOME=/usr/local/jdk1.8
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export HADOOP_HOME=/usr/local/hadoop-2.9.1
    export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
    
    使配置生效:
    $ source ~/.bashrc
    

运行Hadoop(伪集群)

  1. 将hadoop主目录修属主修改为当前用户($USER是当前用户,如果是其他用户或组请明确指定):
    $ sudo chown -R $USER:$USER /usr/local/hadoop-2.9.1
    
  2. 编辑/usr/local/hadoop-2.9.1/etc/hadoop/core-site.xml 文件,修改为以下内容:
    <configuration>
        <property>
            <name>fs.defaultFSname>
            <value>hdfs://localhost:9000value>
        property>
        <property>
    	<name>hadoop.tmp.dirname>
    	<value>/usr/local/hadoop-2.9.1/mydata/hadoop-${user.name}value>
    	<description>A base for other temporary directories.description>
    property>
    configuration>
    
  3. 编辑/usr/local/hadoop-2.9.1/etc/hadoop/hdfs-site.xml文件,修改为以下内容:
    <configuration>
        <property>
            <name>dfs.replicationname>
            <value>1value>
        property>
    configuration>
    
  4. 格式化hdfs文件系统:
    $ hdfs namenode -format
    
  5. 修改hadoop-env.sh配置
    $ sudo gedit /usr/local/hadoop-2.9.1/etc/hadoop/hadoop-env.sh
    
    将JAVA_HOME修改为如下:
    #export JAVA_HOME=${JAVA_HOME}
    export JAVA_HOME=/usr/local/jdk1.8
    
  6. 启动NameNode和DataNode守护进程:
    $ start-dfs.sh
    
  7. 打开 http://localhost:50070/ 可以看到管理界面NameNode的相关信息
    Ubuntu下安装Hadoop(单机)_第1张图片

执行MapReduce任务,使用hadoop预置的示例程序进行演示

  1. 进入hadoop主目录
    $ cd /usr/local/hadoop-2.9.1/
    
  2. 创建HDFS目录
    $ hdfs dfs -mkdir /user
    $ hdfs dfs -mkdir /user/$USER
    
  3. 拷贝文件到hdfs中
    $ hdfs dfs -put etc/hadoop input
    
  4. 运行样例程序
    $ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep input output 'dfs[a-z.]+'
    
  5. 上面命令执行后会在input同级目录下生成一个output目录,通过以下命令查看生成的文件
    $ hdfs dfs -cat output/*
    
    输出如下:
    6	dfs.audit.logger
    4	dfs.class
    3	dfs.logger
    3	dfs.server.namenode.
    2	dfs.audit.log.maxbackupindex
    2	dfs.period
    2	dfs.audit.log.maxfilesize
    1	dfs.log
    1	dfs.file
    1	dfs.servers
    1	dfsadmin
    1	dfsmetrics.log
    1	dfs.replication
    

关闭hdfs

如果要关闭HDFS,执行以下命令:

$ stop-dfs.sh

运行YARN

  1. 修改etc/hadoop/mapred-site.xml:
    <configuration>
        <property>
            <name>mapreduce.framework.namename>
            <value>yarnvalue>
        property>
    configuration>
    
  2. 修改etc/hadoop/yarn-site.xml:
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-servicesname>
            <value>mapreduce_shufflevalue>
        property>
    configuration>
    
  3. 启动YARN(确保HDFS已经启动)
    $ start-yarn.sh
    
    用浏览器打开 http://localhost:8088/ 就可以访问资源管理器
  4. 执行MapReduce任务,还是执行之前的例子,在执行前要先删除output目录:
    $ hadoop fs -rm -r output
    $ cd /usr/local/hadoop-2.9.1/
    $ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep input output 'dfs[a-z.]+'
    
    可以看到控制台输出跟之前不同,这里连接到了本地8032端口的ResourceManager:
    18/11/22 23:12:50 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    18/11/22 23:12:51 INFO input.FileInputFormat: Total input files to process : 29
    18/11/22 23:12:51 INFO mapreduce.JobSubmitter: number of splits:29
    18/11/22 23:12:52 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
    
    刷新 http://localhost:8088/ 可以看到执行状态及结果

关闭YARN

如果要关闭YARN,执行以下命令:

$ stop-yarn.sh

你可能感兴趣的:(Hadoop)