最近一直在考虑统计分析的基础数据、中间数据、结果数据该怎么存放才有利于写入、读取、汇总了,Mysql当然是不二人选,不过涉及到更新的时候,都要先select后update,效率比较低,另外一个就是不同维度的统计数据要是排序的话,那就惨了,到处都是索引,整一张表就是各种索引了,又导致写入变得更加慢,当然有很多办法可以加速这个过程,也是可以信赖的,不过总不甘心,想试一试是否有更好办法了!这时hbase自然出现了,天然的和Hadoop在一块,有基础维度后就可以计算出各种其它不同的维度,各种维度计算还是可以重复进行的,统计数据本就是一个维度+数据这样的结果,刚好就是key/value了,如果需要支持value排序的话,也可以完美解决!遂想,好歹先试一试了,这不才有如下这个实验了!
Hadoop环境:
hadoop2.2.0 +HA(QJM),4节点
Hbase环境:
hbase-0.98.3-hadoop2(安装目录:/home/hadoop/hbase-0.98.3-hadoop2),4个节点(hadoop25\hadoop28\hadoop201\hadoop224)
ZK环境:
三节点的独立的ZK集群(ZK25、ZK28、ZK224,clientPort为2181)
以下为具体安装Hbase-0.98.3那些事!
我有现成已经搭建好,可以稳定运行的Hadoop2.2.0+HA(QJM)的环境,搭建hbase其实很容易的(测试而已,所以要求也不高)。
1、下载hbase-0.98.3-hadoop2-bin.tar.gz
直接从http://hbase.apache.org下载最新的稳定版本hbase-0.98.3-hadoop2,这个版本默认支持的就是hadoop2.2.0,所以省了好些麻烦事。
2、配置那点事
(1)修改hadoop2.2.0的配置文件hdfs-site.xml(增加支持append和增大打开文件的个数限制)
<property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> <property> <name>dfs.support.append</name> <value>true</value> </property>
这个配置修改了,需要把Hadoop集群重启。(我
# #/** # * Copyright 2007 The Apache Software Foundation # * # * Licensed to the Apache Software Foundation (ASF) under one # * or more contributor license agreements. See the NOTICE file # * distributed with this work for additional information # * regarding copyright ownership. The ASF licenses this file # * to you under the Apache License, Version 2.0 (the # * "License"); you may not use this file except in compliance # * with the License. You may obtain a copy of the License at # * # * http://www.apache.org/licenses/LICENSE-2.0 # * # * Unless required by applicable law or agreed to in writing, software # * distributed under the License is distributed on an "AS IS" BASIS, # * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # * See the License for the specific language governing permissions and # * limitations under the License. # */ # Set environment variables here. # This script sets variables multiple times over the course of starting an hbase process, # so try to keep things idempotent unless you want to take an even deeper look # into the startup scripts (bin/hbase, etc.) # The java implementation to use. Java 1.6 required. export JAVA_HOME=/home/hadoop/jdk1.7.0_45 # Extra Java CLASSPATH elements. Optional. # export HBASE_CLASSPATH= # The maximum amount of heap to use, in MB. Default is 1000. export HBASE_HEAPSIZE=1024 # Extra Java runtime options. # Below are what we set by default. May only work with SUN JVM. # For more on why as well as other possible settings, # see http://wiki.apache.org/hadoop/PerformanceTuning export HBASE_OPTS="-XX:+UseConcMarkSweepGC" # Uncomment one of the below three options to enable java garbage collection logging for the server-side processes. # This enables basic gc logging to the .out file. # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps" # This enables basic gc logging to its own file. # If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR . # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>" # This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. # If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR . # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M" # Uncomment one of the below three options to enable java garbage collection logging for the client processes. # This enables basic gc logging to the .out file. # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps" # This enables basic gc logging to its own file. # If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR . # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>" # This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. # If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR . # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M" # Uncomment below if you intend to use the EXPERIMENTAL off heap cache. # export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize=" # Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value. # Uncomment and adjust to enable JMX exporting # See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access. # More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html # # export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false" # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101" # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102" # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103" # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104" # export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105" # File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default. # export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers # Uncomment and adjust to keep all the Region Server pages mapped to be memory resident #HBASE_REGIONSERVER_MLOCK=true #HBASE_REGIONSERVER_UID="hbase" # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. # export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters # Extra ssh options. Empty by default. # export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR" # Where log files are stored. $HBASE_HOME/logs by default. # export HBASE_LOG_DIR=${HBASE_HOME}/logs # Enable remote JDWP debugging of major HBase processes. Meant for Core Developers # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070" # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071" # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072" # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073" # A string representing this instance of hbase. $USER by default. # export HBASE_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HBASE_NICENESS=10 # The directory where pid files are stored. /tmp by default. export HBASE_PID_DIR=/home/hadoop/hbase-0.98.3-hadoop2 # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HBASE_SLAVE_SLEEP=0.1 # Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=false # The default log rolling policy is RFA, where the log file is rolled as per the size defined for the # RFA appender. Please refer to the log4j.properties file to see more details on this appender. # In case one needs to do log rolling on a date change, one should set the environment property # HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA". # For example: # HBASE_ROOT_LOGGER=INFO,DRFA # The reason for changing default to RFA is to avoid the boundary case of filling out disk space as # DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.(3)修改hbase的配置文件conf/hbase-size.xml(利用现成的ZK集群,不使用hbase自行管理ZK模式)
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://mycluster/hbase</value> <span style="background-color: rgb(255, 0, 0);"> <!--必须和core-site.xml的fs.defaultFS值一致--></span> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.tmp.dir</name> <value>/home/hadoop/hbase-0.98.3-hadoop2/tmp</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>zk25,zk28,zk224</value> </property> </configuration>
<pre name="code" class="html">hadoop25 hadoop28 hadoop201 hadoop224
(5)把hadoop/etc/hadoop/hdfs-site.xml文件复制到hbase的conf目录下
(这个很重要哦,否则会报错误,我就遇到问题了,报mycluster未未知的主机)
------------------------------------------------------------------------------------------------------------------------------------------------
===将上述动作在每一台需要安装hbase的节点上重复执行,呵呵,其实配好一台机器,直接COPY更好====
------------------------------------------------------------------------------------------------------------------------------------------------
(6)再就是人见人爱的执行下/home/hadoop/hbase-0.98.3-hadoop2/bin/start-hbase.sh
完成启动了。
(7)看看hbase有否启动成功?
http://hmaster:60010(你在那台机器执行start-hbase.sh那么哪一台机器就是hmaster)
当然最直接了当的办法还是看下日志有没有错误:
hbase-hadoop-regionserver-hadoop25log
hbase-hadoop-master-hadoop25.log