(1)hadoop2.7.1源码编译 http://zilongzilong.iteye.com/blog/2246856
(2)hadoop2.7.1安装准备 http://zilongzilong.iteye.com/blog/2253544
(3)hadoop2.7.1安装 http://zilongzilong.iteye.com/blog/2245547
(4)hbase安装准备 http://zilongzilong.iteye.com/blog/2254451
(5)hbase安装 http://zilongzilong.iteye.com/blog/2254460
(6)snappy安装 http://zilongzilong.iteye.com/blog/2254487
(7)雅虎YCSBC测试hbase性能测试 http://zilongzilong.iteye.com/blog/2248863
(8)spring-hadoop实战 http://zilongzilong.iteye.com/blog/2254491
安装前的准备工作参见(4)hbase安装准备 http://zilongzilong.iteye.com/blog/2254451
1.下载安装包hbase-1.1.2-bin.tar.gz放置于/opt
2.解压hbase-1.1.2-bin.tar.gz
tar zxvf hbase-1.1.2-bin.tar.gz
3.配置环境变量,在/etc/profile添加如下内容:
export HBASE=/opt/hbase-1.1.2
export PATH=${HBASE}/bin:${PATH}
4.创建hbase临时文件夹(集群每个节点都需要创建)
mkdir /home/hadoop/hbase
5.修改/opt/hbase-1.1.2/conf/hbase-env.sh,内容如下:
# #/** # * Licensed to the Apache Software Foundation (ASF) under one # * or more contributor license agreements. See the NOTICE file # * distributed with this work for additional information # * regarding copyright ownership. The ASF licenses this file # * to you under the Apache License, Version 2.0 (the # * "License"); you may not use this file except in compliance # * with the License. You may obtain a copy of the License at # * # * http://www.apache.org/licenses/LICENSE-2.0 # * # * Unless required by applicable law or agreed to in writing, software # * distributed under the License is distributed on an "AS IS" BASIS, # * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # * See the License for the specific language governing permissions and # * limitations under the License. # */ # Set environment variables here. # This script sets variables multiple times over the course of starting an hbase process, # so try to keep things idempotent unless you want to take an even deeper look # into the startup scripts (bin/hbase, etc.) # The java implementation to use. Java 1.7+ required. # export JAVA_HOME=/usr/java/jdk1.6.0/ export JAVA_HOME=/opt/java/jdk1.7.0_65/ # Extra Java CLASSPATH elements. Optional. # export HBASE_CLASSPATH= export HBASE_CLASSPATH=/opt/hbase-1.1.2/conf # The maximum amount of heap to use. Default is left to JVM default. # export HBASE_HEAPSIZE=1G export HBASE_HEAPSIZE=2G # Uncomment below if you intend to use off heap cache. For example, to allocate 8G of # offheap, set the value to "8G". # export HBASE_OFFHEAPSIZE=1G # Extra Java runtime options. # Below are what we set by default. May only work with SUN JVM. # For more on why as well as other possible settings, # see http://wiki.apache.org/hadoop/PerformanceTuning export HBASE_OPTS="-XX:+UseConcMarkSweepGC" export HBASE_OPTS="$HBASE_OPTS -XX:CMSInitiatingOccupancyFraction=60" # Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+ export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m" export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m" # Uncomment one of the below three options to enable java garbage collection logging for the server-side processes. # This enables basic gc logging to the .out file. # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps" # This enables basic gc logging to its own file. # If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR . # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>" # This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. # If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR . # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M" # Uncomment one of the below three options to enable java garbage collection logging for the client processes. # This enables basic gc logging to the .out file. # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps" # This enables basic gc logging to its own file. # If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR . # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>" # This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. # If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR . # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M" export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/opt/hbase-1.1.2/logs/gc-hbase.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M" # See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations # needed setting up off-heap block caching. # Uncomment and adjust to enable JMX exporting # See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access. # More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html # NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX # section in HBase Reference Guide for instructions. # export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false" # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101" # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102" # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103" # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104" # export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105" # File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default. # export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers # Uncomment and adjust to keep all the Region Server pages mapped to be memory resident #HBASE_REGIONSERVER_MLOCK=true #HBASE_REGIONSERVER_UID="hbase" # File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default. # export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters # Extra ssh options. Empty by default. # export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR" # Where log files are stored. $HBASE_HOME/logs by default. # export HBASE_LOG_DIR=${HBASE_HOME}/logs # Enable remote JDWP debugging of major HBase processes. Meant for Core Developers # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070" # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071" # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072" # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073" # A string representing this instance of hbase. $USER by default. # export HBASE_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HBASE_NICENESS=10 # The directory where pid files are stored. /tmp by default. # export HBASE_PID_DIR=/var/hadoop/pids # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HBASE_SLAVE_SLEEP=0.1 # Tell HBase whether it should manage it's own instance of Zookeeper or not. # export HBASE_MANAGES_ZK=true export HBASE_MANAGES_ZK=false # The default log rolling policy is RFA, where the log file is rolled as per the size defined for the # RFA appender. Please refer to the log4j.properties file to see more details on this appender. # In case one needs to do log rolling on a date change, one should set the environment property # HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA". # For example: # HBASE_ROOT_LOGGER=INFO,DRFA # The reason for changing default to RFA is to avoid the boundary case of filling out disk space as # DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.
6.修改配置文件/opt/hbase-1.1.2/conf/hbase-site.xml,内容如下:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- /** * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ --> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://192.168.181.66:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>nmsc0,nmsc1,nmsc2</value> </property> <property> <name>hbase.tmp.dir</name> <value>file:/home/hadoop/hbase/</value> </property> <property> <name>hbase.master</name> <value>hdfs://192.168.181.66:60000</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>file:/home/hadoop/zookeeper</value> </property> <property> <!--htable.setWriteBufferSize(5242880);//5M --> <name>hbase.client.write.buffer</name> <value>5242880</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>300</value> <description>Count of RPC Listener instances spun up on RegionServers.Same property is used by the Master for count of master handlers.</description> </property> <property> <name>hbase.table.sanity.checks</name> <value>false</value> </property> <property> <!--every 30s,the master will check regionser is working --> <name>zookeeper.session.timeout</name> <value>30000</value> </property> <property> <!--every region max file size set to 4G--> <name>hbase.hregion.max.filesize</name> <value>10737418240</value> </property> </configuration>7.修改/opt/hbase-1.1.2/conf/regionservers,内容如下:
nmsc1 nmsc2
8.远程复制分发安装文件
scp -r /opt/hbase-1.1.2 root@nmsc1:/opt/
scp -r /opt/hbase-1.1.2 root@nmsc2:/opt/
9.启动和停止hbase,命令是在集群中任何机器执行都可以的,首先保证Hadoop要启动。
启动hbase,涉及HMaster,HRegionServer
cd /opt/hbase-1.1.2/bin/
./start-hbase.sh
jps
关闭hbase,涉及HMaster,HRegionServer
cd /opt/hbase-1.1.2/bin/
./stop-hbase.sh
10,查看hbase管理界面http://192.168.181.66:16010
11.使用hbase自带的importTSV工具将TSV格式文件导入到hbase数据表
1)创hbase数据表sms_send_result,该表上列族为info,列族info下包含列info:sender,info:receiver,info:sendtime,info:sendstatus,info:message,列族info采用SNAPPY压缩,该表按照rowkey预分区,该列族info有效时间为60天,预分区按照['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60','61','62','63','64','65','66','67','68','69','70','71','72','73','74','75','76','77','78','79','80','81','82','83','84','85','86','87','88','89','90','91','92','93','94','95','96','97','98','99']分区为100区
cd /opt/hbase-1.1.2 bin/hbase shell disable 'sms_send_result' drop 'sms_send_result' create 'sms_send_result', {NAME => 'info', COMPRESSION => 'SNAPPY',TTL=>'5184000' }, SPLITS => ['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60','61','62','63','64','65','66','67','68','69','70','71','72','73','74','75','76','77','78','79','80','81','82','83','84','85','86','87','88','89','90','91','92','93','94','95','96','97','98','99']
2)将TSV格式的linux文件/opt/hadoop-2.7.1/bin/sms.tsv上传到Hadoop的HDFS上,HDFS上放置到根目录下:
cd /opt/hadoop-2.7.1/bin/ ./hdfs dfs -put /opt/hadoop-2.7.1/bin/sms.tsv /
3)利用hbase自带工具importTSV将HDFS文件/sms.tsv导入到hbase数据表sms_send_result
cd /opt/hbase-1.1.2 bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,info:sender,info:receiver,info:sendtime,info:sendstatus,info:message sms_send_result hdfs://192.168.181.66:9000/sms.tsv
4)sms.tsv大致内容如下
1154011896700000000000000201112251548071060776636 106591145302 19999999999 20111225 15:48:07 DELIVRD 阿拉斯加的费 9908996845700000000000000201112251548071060776638 106591145302 19899999999 20111225 15:48:07 DELIVRD 暗室逢灯
在TSV中,要按照3中的顺序HBASE_ROW_KEY,info:sender,info:receiver,info:sendtime,info:sendstatus,info:message放置数据,数据间要严格以TAB分隔,如果不是TAB分隔,可以在导入的时候以-Dimporttsv.separator=来指定。
再者注意:hbase自带工具importTSV只在数据表为空表导入时候效率高,数据表为非空后,导入效率低下,原因是数据会不断进行split和comprassion