搭hadoop+hbase(mac单机)

摘要

单机搭hadoop+hbase流程记录

引用学习:

1、hadoop 2.7.4 单机版安装
2、HBase环境搭建

安装程序准备:

hadoop-2.7.7 : https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
hbase-2.1.7 : https://hbase.apache.org/downloads.html

安装hadoop

(JAVA环境不说了,自行配置)

1、解压

tar -zxvf hadoop-2.7.7.tar.gz
mkdir /usr/local/hadoop
mv hadoop-2.7.7 /usr/local/hadoop
# 软连接
ln -s /usr/local/hadoop/bin/hdfs /usr/bin/hdfs

验证:

hadoop version
搭hadoop+hbase(mac单机)_第1张图片
image.png

2、配置文件

  1. hadoop-env.sh
# 具体拿一下自己的JAVA_HOME路径
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home
  1. yarn-env.sh
# 具体拿一下自己的JAVA_HOME路径
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home
  1. core-site.xml


    fs.default.name
    hdfs://localhost:9000
    HDFS的URI

 

    hadoop.tmp.dir
    /usr/local/hadoop/tmp
    本地的hadoop临时文件夹


此处需要创建 /usr/lcoal/hadoop/tmp 文件夹

  1. hdfs-site.xml


    dfs.name.dir
    /usr/local/hadoop/data0/hadoop/hdfs/name
    namenode上存储hdfs名字空间元数据  

 

    dfs.data.dir
    /usr/local/hadoop/data0/hadoop/hdfs/data
    datanode上数据块的物理存储位置

 

    dfs.replication
    1
    副本个数,配置默认是3,应小于datanode机器数量


需要创建 /usr/local/hadoop/data0/hadoop 文件夹

  1. mapred-site.xml


        mapreduce.framework.name
        yarn


  1. yarn-site.xml


        yarn.nodemanager.aux-services
        mapreduce_shuffle


        yarn.resourcemanager.webapp.address
        192.168.31.250:8099


webapp的地址配置成本机的地址即可

3、启动hadoop

# 初始化
bin/hdfs namenode –format
# 启动服务
$HADOOP_HOME/bin/start-all.sh

判断是否启动成功:


搭hadoop+hbase(mac单机)_第2张图片
image.png

4、hdfs命令使用:

# 创建文件与查看
hadoop fs -mkdir /xxx
hadoop fs -ls /
# 文件上传
hadoop fs -put ./data/* /data/

# 文件删除
hadoop fs -rm -r /data/*

5、一些学习脚本

mapper.py

#!/usr/bin/env python
import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print '%s\t%s' % (word, 1)

reducer.py

#!/usr/bin/env python
from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)
    count = int(count)

    if current_word == word:
        current_count += count
    else:
        if current_word:
            print '%s\t%s' % (current_word, current_count)
        current_count = count
        current_word = word

if current_word == word:
    print '%s\t%s' % (current_word, current_count)

run.sh

STREAM=/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.7.jar
WORKPLACE=/usr/local/hadoop/workplace

hadoop fs -rm -r /data/output

hadoop jar $STREAM \
-files $WORKPLACE/script/mapper.py,$WORKPLACE/script/reducer.py \
-mapper $WORKPLACE/script/mapper.py \
-reducer $WORKPLACE/script/reducer.py \
-input /data/*.json \
-output /data/output

当然,要先把字典(*.json)上传到hdfs的/data文件中。
*.json格式如下

test1 test1 test1 test1 test1 test2 test1 test1 test2 test2 test2 test2 test3 test3 test3 test3 test3 test3 test3 test3

安装hbase

1、解压

我把解压后的文件夹放了在

$HADOOP_HOME/softs/hbase

2、配置

  1. hbase-site.xml

  
    hbase.rootdir
    /usr/local/hadoop/soft/hbase
  
  
    hbase.zookeeper.property.dataDir
    /usr/local/hadoop/softs/hbase/zookeeper
  
  
    hbase.cluster.distributed
    true
  
  
  hbase.master.info.port
    60010
  

注意,这里的路径都要配置成自己解压的路径。

3、启动

bash hbase-daemon.sh start zookeeper
bash hbase-daemon.sh start master
bash hbase-daemon.sh start regionserver

验证启动完成:


搭hadoop+hbase(mac单机)_第3张图片
image.png

以及本地控制台:
http://127.0.0.1:60010/master-status

4、一些数据操作方式

  1. 进入hbase shell
hbase shell
  1. 创建 user 表:
create 'user', 'info'
  1. 删除表
disable 'user'
drop 'user'
  1. 增删改查数据
# 插入keyid为'id001'的用户,其中有一个属性'name',值为'SteveWooo'
put 'user','id001','info:name','SteveWooo'

# 查看'id001'用户数据
get 'user','id001'

# 为'id001'用户新增一个属性age
put 'user','id001','info:age','18'

# 修改'id001'用户的age
put 'user','id001','info:age','19'

# 删除'id001'用户数据
deleteall 'user','id001'

你可能感兴趣的:(搭hadoop+hbase(mac单机))