HBase实践
实验目的:通过Hadoop和HBase环境搭建和进行简单的使用,增加对大数据存储和NoSQL数据库的了解。
实验环境:
电脑硬件配置:
Ram:32G
CPU:Intel® Core™ i5-4460 CPU @ 3.20GHz × 4
OS:Ubuntu 16.04.1 LTS 64-bit
Virtubox:5.0
虚拟机:
Master ,slave1,slave2,slave3:
Ram:4G
OS:Ubuntu 16.04.1 LTS 64-bit
HADOOP:2.7.3
JAVA:1.8
HBASE:1.2.4
实验要求:
1、通过虚拟机至少搭建3个节点(注意并不是采用伪分布式)。
2、通过客户端(shell)可以正常的进行插入、删除、查看等操作。
3、将下列关系数据库的Table转换为适合于HBase存储的表并插入数据:
学生表(Student)
学号(S_No) |
姓名(S_Name) |
性别(S_Sex) |
年龄(S_Age) |
2015001 |
Zhangsan |
male |
23 |
2015003 |
Mary |
female |
22 |
2015003 |
Lisi |
male |
24 |
课程表(Course)
课程号(C_No) |
课程名(C_Name) |
学分(C_Credit) |
123001 |
Math |
2.0 |
123002 |
Computer Science |
5.0 |
123003 |
English |
3.0 |
选课表(SC)
学号(SC_Sno) |
课程号(SC_Cno) |
成绩(SC_Score) |
2015001 |
123001 |
86 |
2015001 |
123003 |
69 |
2015002 |
123002 |
77 |
2015002 |
123003 |
99 |
2015003 |
123001 |
98 |
2015003 |
123002 |
95 |
同时,请使用HBase Java API编程完成以下功能:
(1)createTable(String tableName, String[] fields)
创建表,参数tableName为表的名称,字符串数组fields为存储记录各个域名称的数组。要求当HBase已经存在名为tableName的表的时候,先删除原有的表,然后再创建新的表。
(2)addRecord(String tableName, String row, String[] fields, String[] values)
向表tableName、行row(用S_Name表示)和字符串数组files指定的单元格中添加对应的数据values。其中fields中每个元素如果对应的列族下还有相应的列限定符的话,用“columnFamily:column”表示。例如,同时向“Math”、“Computer Science”、“English”三列添加成绩时,字符串数组fields为{“Score:Math”,”Score;Computer Science”,”Score:English”},数组values存储这三门课的成绩。
(3)scanColumn(String tableName, String column)
浏览表tableName某一列的数据,如果某一行记录中该列数据不存在,则返回null。要求当参数column为某一列族名称时,如果底下有若干个列限定符,则要列出每个列限定符代表的列的数据;当参数column为某一列具体名称(例如“Score:Math”)时,只需要列出该列的数据。
(4)modifyData(String tableName, String row, String column)
修改表tableName,行row(可以用学生姓名S_Name表示),列column指定的单元格的数据。
(5)deleteRow(String tableName, String row)
删除表tableName中row指定的行的记录。
实验过程:
1.hadoop配置:
教程:https://thwang1206.gitbooks.io/hadoop-installation/content/install_hadoop.html
2.配置结果:
1个Master和3个Slave
1.Hbase配置
教程:https://thwang1206.gitbooks.io/hadoop-installation/content/install_hbase.html
2.配置结果
启动hbase成功!
代码运行图:
Hbase shell 查看创建插入是否成功。
主要代码:
附录:
package com.popoaichuiniu.jacy;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
// Class that has nothing but a main.
// Does a Put, Get and a Scan against an hbase table.
// The API described here is since HBase 1.0.
public class HBaseExample {
public static Configuration config=null;
public static Connection connection=null;
public static void createTable(String tableName, String[] fields) throws IOException
{
Admin admin=connection.getAdmin();
// Instantiating table descriptor class
//HTableDescriptor contains the details about an HBase table such as the descriptors of all the column families,
//is the table a catalog table, -ROOT- or hbase:meta , if the table is read only, the maximum size of the memstore, when the region split should occur, coprocessors associated with it etc...
if(admin.tableExists(TableName.valueOf(tableName)))
{
System.out.println("table has existed!");
}
else
{
HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf(tableName));
for(int i=0;i { HColumnDescriptor hColumnFamily=new HColumnDescriptor(fields[i].getBytes()); tableDescriptor.addFamily(hColumnFamily); } admin.createTable(tableDescriptor); System.out.println("create successfully!"); } } public static void addRecord(String tableName, String row, String[] fields, String[] values) throws IOException { Table table=connection.getTable(TableName.valueOf(tableName)); Put put=new Put(row.getBytes()); for(int i=0;i put.add(fields[i].getBytes(),fields[i].getBytes() , values[i].getBytes()); table.put(put); } public void scanColumn(String tableName, String column) throws IOException { Table table=connection.getTable(TableName.valueOf(tableName)); if(column.contains(":"))//only one column {String []temp=column.split(":"); ResultScanner rs=table.getScanner(temp[0].getBytes(),temp[1].getBytes()); for(Iterator { Result result=rs.next(); byte []values=result.value(); if(values==null) System.out.print("null"+" "); else System.out.print(Bytes.toString(values)+" "); } System.out.println(""); rs.close(); } else { ResultScanner rs=table.getScanner(column.getBytes()); for(Iterator { Result result=rs.next(); if(result!=null) System.out.println(result+" "); else System.out.print("null"+" "); } System.out.println(""); rs.close(); } } public void modifyData(String tableName, String row, String column,String value) throws IOException {Table table=connection.getTable(TableName.valueOf(tableName)); Put put=new Put(row.getBytes()); String []temp=column.split(":"); put.addColumn(temp[0].getBytes(), temp[1].getBytes(), value.getBytes()); table.put(put); } public void deleteRow(String tableName, String row) throws IOException { Table table=connection.getTable(TableName.valueOf(tableName)); Delete delete=new Delete(row.getBytes()); table.delete(delete); } public static void main(String[] args) throws IOException { // You need a configuration object to tell the client where to connect. // When you create a HBaseConfiguration, it reads in whatever you've set // into your hbase-site.xml and in hbase-default.xml, as long as these can // be found on the CLASSPATH config = HBaseConfiguration.create(); // Next you need a Connection to the cluster. Create one. When done with it, // close it. A try/finally is a good way to ensure it gets closed or use // the jdk7 idiom, try-with-resources: see // https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html // // Connections are heavyweight. Create one once and keep it around. From a Connection // you get a Table instance to access Tables, an Admin instance to administer the cluster, // and RegionLocator to find where regions are out on the cluster. As opposed to Connections, // Table, Admin and RegionLocator instances are lightweight; create as you need them and then // close when done. // String fields []=new String[]{"S_No","S_Name","S_Sex","S_Age"}; String values []=new String[]{"SA16011096","zms","man","22"}; connection = ConnectionFactory.createConnection(config); createTable("Student",fields); addRecord("Student","SA16011096",fields,values); fields=new String[]{"C_No","C_Name","C_credits"};// values=new String[]{"123","zuheshuxue","3.5"}; createTable("Course",fields); addRecord("Course","123",fields,values); fields=new String[]{"SC_Sno","SC_Cno","SC_Score"};// values=new String[]{"SA16011096","123","85"}; createTable("SC",fields); addRecord("SC","SA16011096",fields,values); connection.close(); } }