如何在VirtualBox虚拟机中搭建Hadoop单节点集群

系统环境准备

  • 1、virtual-box安装
    • virtual-box下载
  • 2、ubuntu安装
    • 增加盘片(ubuntu-16.04.5-desktop-amd64.iso)
      • 清华大学开源软件镜像站
    • 优化配置
      • 安装增强功能
      • 共享剪贴板
      • 设置最优下载源
      • Terminal设置
      • 默认输入法设置
      • 安装zsh/语法高亮/命令提示

hadoop安装和配置

  • 1、安装jdk

    • 命令安装
    java -version
    sudo apt-get update
    sudo apt-get install default-jdk
    java -version
    # 查看Java安装路径
    update-alternatives --display java 
    
  • 2、安装ssh,免密登录设置

    • sudo apt-get install openssh-server
    • ssh-keygen -t rsa -C "[email protected]"
    • cat ~/.ssh/id_rsa.pub >> authorized_keys
    • ssh localhost可能报错, 报错解决:
      • 方法一:选择系统偏好设置->选择共享->点击远程登录
      • 方法二:sudo apt-get install openssh-server
  • 3、安装hadoop

    • hadoop-2.7.0
    • weget 方式下载
    wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.0/hadoop-2.7.0.tar.gz
    # 解压到目录
    sudo tar -zxvf hadoop-2.7.0.tar.gz -C /usr/local/hadoop
    
    
  • 4、vim ~/.bashrc, 增加环境变量如下:

    ### environment variables, added by sandy
    
    # java,hadoop depends on this
    export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
    
    # HADOOP_HOME
    export HADOOP_HOME=/usr/local/hadoop
    
    # path
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin
    
    # other hadoop param
    export HADOOP_MAPPED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    
    # link hadoop lib
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_OPTS="-DJava.library.path=$HADOOP_HOME/lib"
    export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
    
  • 5、修改$HADOOP_HOME/etc/hadoop目录中的文件配置

    • a、配置hadoop-env.sh
    # 修改JAVA_HOME
    export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
    
    • b、配置core-site.xml
    
    	
    		fs.default.name
    		hdfs://localhost:9000
    	
    
    
    • c、配置yarn-site.xml
    
    
    
    
    	
    		yarn.nodemanager.aux-services
    		mapreduce_shuffle
    	
    	
    		yarn.nodemanager.aux-services.mapreduce.shuffle.class
    		org.apache.hadoop.mapred.ShuffleHandler
    	
    
    
    
    • d、配置mapred-site.xml
      • 先sudo cp mapred-site.xml.template mapred-site.xml
      • 再修改
      
      	
      		mapreduce.framework.name
      		yarn
      	
      
      
      
      
    • e、配置hdfs-site.xml
    
    
    
    	
    		dfs.replication
    		3
    	
    	
    		dfs.namenode.name.dir
    		file:/usr/local/hadoop/hadoop_data/hdfs/namenode
    	
    	
    		dfs.datanode.data.dir
    		file:/usr/local/hadoop/hadoop_data/hdfs/datanode
    	
    
    
  • 6、创建并格式化hdfs目录

    • Ubuntu建立和删除用户
    sudo useradd sandy
    sudo passwd sandy
    sudo chown sandy:sandy -R /usr/local/hadoop/
    
    # 先开启服务,再执行NameNode格式化,再再重启,否则看不到namenode
    start-all.sh
    hadoop namenode -format
    stop-all.sh
    start-all.sh
    jps
    
    
  • 7、jps命令查看进程

    6570 NameNode
    5606 DataNode
    7053 Jps
    5779 SecondaryNameNode
    6046 NodeManager
    5921 ResourceManager
    
  • 8、测试

    • 浏览器访问localhost:8088
    • 命令行访问
    curl -XGET localhost:8088
    

Hadoop Hdfs常用命令

  • 1、hadoop command

    start-all.sh <==> start-dfs.sh & start-yarn.sh
    
    hadoop namenode -format
    
    # check status
    jps
    
    hadoop fs -mkdir hdfs://localhost:9000/test
    
    hadoop fs -ls hdfs://localhost:9000/test
    
    hadoop fs -put test.py hdfs://localhost:9000/test
    
    hadoop fs -cat hdfs://localhost:9000/test
    
    hadoop fs -get hdfs://localhost:9000/test
    
    hadoop fs -rm hdfs://localhost:9000/test
    
    stop-all.sh <==> stop-dfs.sh & stop-yarn.sh
    
    
  • 2、practice

    # mkdir
    hadoop fs -mkdir /user
    
    hadoop fs -mkdir /user/test
    
    hadoop fs -mkdir -p /dir1/dir2/dir3
    
    # copyFromLocal to dir
    hadoop fs -copyFromLocal file /file
    hadoop fs -copyFromLocal file.py /file/file.py
    hadoop fs -copyFromLocal -f file /file
    hadoop fs -copyFromLocal file file.py note.md /dir1
    
    hadoop fs -copyFromLocal ../hadoop_test /
    
    echo $HADOOP_HOME | hadoop fs -put - /file/hadoop.conf
    
    ls ~ | hadoop fs -put - /file/dirlist.txt
    
    # list file content
    hadoop fs -cat /file/file.py
    hadoop fs -cat /file/file.py|tail
    hadoop fs -cat /file/file.py|head
    hadoop fs -cat /file/file.py|more
    
    # view one level
    hadoop fs -ls /user
    hadoop fs -ls /
    
    # view multi level
    hadoop fs -ls -R /
    
    
    hadoop fs -copyToLocal /dir1
    hadoop fs -copyToLocal /file/file.py
    hadoop fs -get /hadoop_test hdp_test
    hadoop fs -rm -r /dir1
    hadoop fs -rm -R /file
    hadoop fs -rm /test.py
    hadoop fs -cp /file /test
    
    

WordCount Demo

  • Hadoop示例程序WordCount详解及实例

    hadoop com.sun.tools.javac.Main WordCount.java
    
    jar cf wc.jar WordCount*.class
    
    hadoop fs -mkdir -p /user/hdfs/wordcount/input
    
    hadoop fs -put word.txt /user/hdfs/wordcount/input
    
    hadoop jar wc.jar WordCount /user/hdfs/wordcount/input/word.txt /user/hdfs/wordcount/output
    
    hadoop fs -ls /user/hdfs/wordcount/output
    
    hadoop fs -cat /user/hdfs/wordcount/output/part-00000|tail
    
    

problems

  • There are 0 datanode(s) running and no node(s) are excluded in this operation.

拓展阅读

  • hadoop命令大全
  • Hadoop Hdfs常用命令
  • Cloudera-Manager_中文手册
  • HDFS文件操作
  • Hadoop 2.6 Single Node Cluster 安裝指令
  • 《Python+Spark 2.0+Hadoop机器学习与大数据实战》(清华大学出版社)

你可能感兴趣的:(bigdata,hadoop)