Hadoop2.7.1+Hbase1.1.2集群环境搭建(5)hbase安装

(1)hadoop2.7.1源码编译           http://zilongzilong.iteye.com/blog/2246856

(2)hadoop2.7.1安装准备           http://zilongzilong.iteye.com/blog/2253544

(3)hadoop2.7.1安装                   http://zilongzilong.iteye.com/blog/2245547

(4)hbase安装准备                   http://zilongzilong.iteye.com/blog/2254451

(5)hbase安装                   http://zilongzilong.iteye.com/blog/2254460

(6)snappy安装                   http://zilongzilong.iteye.com/blog/2254487

(7)雅虎YCSBC测试hbase性能测试     http://zilongzilong.iteye.com/blog/2248863

 

(8)spring-hadoop实战                   http://zilongzilong.iteye.com/blog/2254491

 

安装前的准备工作参见(4)hbase安装准备                   http://zilongzilong.iteye.com/blog/2254451

1.下载安装包hbase-1.1.2-bin.tar.gz放置于/opt

2.解压hbase-1.1.2-bin.tar.gz

   tar zxvf hbase-1.1.2-bin.tar.gz

3.配置环境变量,在/etc/profile添加如下内容:

export HBASE=/opt/hbase-1.1.2

export PATH=${HBASE}/bin:${PATH}

4.创建hbase临时文件夹(集群每个节点都需要创建)

   mkdir  /home/hadoop/hbase

5.修改/opt/hbase-1.1.2/conf/hbase-env.sh,内容如下:

 

#
#/**
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.7+ required.
# export JAVA_HOME=/usr/java/jdk1.6.0/
export JAVA_HOME=/opt/java/jdk1.7.0_65/

# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=
export HBASE_CLASSPATH=/opt/hbase-1.1.2/conf

# The maximum amount of heap to use. Default is left to JVM default.
# export HBASE_HEAPSIZE=1G
export HBASE_HEAPSIZE=2G

# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of 
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
export HBASE_OPTS="$HBASE_OPTS -XX:CMSInitiatingOccupancyFraction=60"

# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/opt/hbase-1.1.2/logs/gc-hbase.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations
# needed setting up off-heap block caching. 

# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX
# section in HBase Reference Guide for instructions.

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
# export HBASE_MANAGES_ZK=true
export HBASE_MANAGES_ZK=false
# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.

 6.修改配置文件/opt/hbase-1.1.2/conf/hbase-site.xml,内容如下:

 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
<configuration>
 <property>
  <name>hbase.rootdir</name>
  <value>hdfs://192.168.181.66:9000/hbase</value>
 </property>
 <property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
 </property>
 <property>
  <name>hbase.zookeeper.quorum</name>
  <value>nmsc0,nmsc1,nmsc2</value>
 </property>
 <property>
  <name>hbase.tmp.dir</name>
  <value>file:/home/hadoop/hbase/</value>
 </property>
 <property>
  <name>hbase.master</name>
  <value>hdfs://192.168.181.66:60000</value>
 </property>
 <property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>file:/home/hadoop/zookeeper</value>
 </property>
        <property>
  <!--htable.setWriteBufferSize(5242880);//5M -->
  <name>hbase.client.write.buffer</name>
  <value>5242880</value>
 </property>
        <property>
  <name>hbase.regionserver.handler.count</name>
  <value>300</value>
  <description>Count of RPC Listener instances spun up on RegionServers.Same property is used by the Master for count of master handlers.</description>
 </property>
 <property>
  <name>hbase.table.sanity.checks</name>
  <value>false</value>
 </property>
          <property>
    <!--every 30s,the master will check regionser is working -->
    <name>zookeeper.session.timeout</name>
    <value>30000</value>
  </property>
  <property>
    <!--every region max file size set to 4G-->
    <name>hbase.hregion.max.filesize</name>
    <value>10737418240</value>
  </property>
</configuration>
 7.修改/opt/hbase-1.1.2/conf/regionservers,内容如下:

 

 

nmsc1
nmsc2

 8.远程复制分发安装文件

scp -r /opt/hbase-1.1.2 root@nmsc1:/opt/

scp -r /opt/hbase-1.1.2 root@nmsc2:/opt/

9.启动和停止hbase,命令是在集群中任何机器执行都可以的,首先保证Hadoop要启动。

  启动hbase,涉及HMaster,HRegionServer

  cd /opt/hbase-1.1.2/bin/

  ./start-hbase.sh

  jps

Hadoop2.7.1+Hbase1.1.2集群环境搭建(5)hbase安装_第1张图片
  

Hadoop2.7.1+Hbase1.1.2集群环境搭建(5)hbase安装_第2张图片
 

  关闭hbase,涉及HMaster,HRegionServer

  cd /opt/hbase-1.1.2/bin/

  ./stop-hbase.sh

 

10,查看hbase管理界面http://192.168.181.66:16010

Hadoop2.7.1+Hbase1.1.2集群环境搭建(5)hbase安装_第3张图片
 

11.使用hbase自带的importTSV工具将TSV格式文件导入到hbase数据表

    1)创hbase数据表sms_send_result,该表上列族为info,列族info下包含列info:sender,info:receiver,info:sendtime,info:sendstatus,info:message,列族info采用SNAPPY压缩,该表按照rowkey预分区,该列族info有效时间为60天,预分区按照['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60','61','62','63','64','65','66','67','68','69','70','71','72','73','74','75','76','77','78','79','80','81','82','83','84','85','86','87','88','89','90','91','92','93','94','95','96','97','98','99']分区为100区

cd /opt/hbase-1.1.2
bin/hbase shell
disable 'sms_send_result'
drop    'sms_send_result'
create 'sms_send_result', {NAME => 'info', COMPRESSION => 'SNAPPY',TTL=>'5184000' }, SPLITS => ['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60','61','62','63','64','65','66','67','68','69','70','71','72','73','74','75','76','77','78','79','80','81','82','83','84','85','86','87','88','89','90','91','92','93','94','95','96','97','98','99']

    

    2)将TSV格式的linux文件/opt/hadoop-2.7.1/bin/sms.tsv上传到Hadoop的HDFS上,HDFS上放置到根目录下:

cd /opt/hadoop-2.7.1/bin/
./hdfs dfs -put /opt/hadoop-2.7.1/bin/sms.tsv /

    3)利用hbase自带工具importTSV将HDFS文件/sms.tsv导入到hbase数据表sms_send_result

cd /opt/hbase-1.1.2
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,info:sender,info:receiver,info:sendtime,info:sendstatus,info:message sms_send_result hdfs://192.168.181.66:9000/sms.tsv

    4)sms.tsv大致内容如下

1154011896700000000000000201112251548071060776636	106591145302	19999999999	20111225 15:48:07	DELIVRD	阿拉斯加的费
9908996845700000000000000201112251548071060776638	106591145302	19899999999	20111225 15:48:07	DELIVRD	暗室逢灯

    在TSV中,要按照3中的顺序HBASE_ROW_KEY,info:sender,info:receiver,info:sendtime,info:sendstatus,info:message放置数据,数据间要严格以TAB分隔,如果不是TAB分隔,可以在导入的时候以-Dimporttsv.separator=来指定

    再者注意:hbase自带工具importTSV只在数据表为空表导入时候效率高,数据表为非空后,导入效率低下,原因是数据会不断进行split和comprassion

你可能感兴趣的:(安装,hbase,importtsv)