HBase与Hive练习题

本题是一个综合练习题目总共包括以下部分:
1.数据的预处理阶段
2.数据的入库操作阶段
3.数据的分析阶段
4.数据保存到数据库阶段
5.数据的查询显示阶段

原始数据示例:
qR8WRLrO2aQ:mienge:406:People & Blogs:599:2788:5:1:0:4UUEKhr6vfA:zvDPXgPiiWI:TxP1eXHJQ2Q:k5Kb1K0zVxU:hLP_mJIMNFg:tzNRSSTGF4o:BrUGfqJANn8:OVIc-mNxqHc:gdxtKvNiYXc:bHZRZ-1A-qk:GUJdU6uHyzU:eyZOjktUb5M:Dv15_9gnM2A:lMQydgG1N2k:U0gZppW_-2Y:dUVU6xpMc6Y:ApA6VEYI8zQ:a3_boc9Z_Pc:N1z4tYob0hM:2UJkU2neoBs
预处理之后的数据示例:
qR8WRLrO2aQ:mienge:406:People & Blogs:599:2788:5:1:0:4UUEKhr6vfA,zvDPXgPiiWI,TxP1eXHJQ2Q,k5Kb1K0zVxU,hLP_mJIMNFg,tzNRSSTGF4o,BrUGfqJANn8,OVIc-mNxqHc,gdxtKvNiYXc,bHZRZ-1A-qk,GUJdU6uHyzU,eyZOjktUb5M,Dv15_9gnM2A,lMQydgG1N2k,U0gZppW_-2Y,dUVU6xpMc6Y,ApA6VEYI8zQ,a3_boc9Z_Pc,N1z4tYob0hM,2UJkU2neoBs

1、对原始数据进行预处理,格式为上面给出的预处理之后的示例数据。
通过观察原始数据形式,可以发现,每个字段之间使用“:”分割,视频可以有多个视频类别,类别之间&符号分割,且分割的两边有空格字符,同时相关视频也是可以有多个,多个相关视频也是用“:”进行分割。为了分析数据时方便,我们首先进行数据重组清洗操作。
即:将每条数据的类别用“,”分割,同时去掉两边空格,多个“相关视频id”也使用“,”进行分割

2、把预处理之后的数据进行入库到hive中
2.1创建数据库和表
创建数据库名字为:video
创建原始数据表:
视频表:video_ori 用户表:video_user_ori
创建ORC格式的表:
视频表:video_orc 用户表:video_user_orc
给出创建原始表语句
创建video_ori视频表:
create database video;

create table video_ori(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
row format delimited
fields terminated by “:”
stored as textfile;
创建video_user_ori用户表:
create table video_user_ori(
uploader string,
videos string,
friends string)
row format delimited
fields terminated by “,”
stored as textfile;
请写出ORC格式的建表语句:
创建video_orc表
create table video_orc(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
row format delimited
fields terminated by “:”
stored as ORC;

创建video_user_orc表:
create table video_user_orc(
uploader string,
videos string,
friends string)
row format delimited
fields terminated by “,”
stored as ORC;

2.2分别导入预处理之后的视频数据到原始表video_ori和导入原始用户表的数据到video_user_ori中
请写出导入语句:
video_ori:
LOAD DATA INPATH ‘/export/data/video.txt’ OVERWRITE INTO TABLE video_ori;
video_user_ori:
LOAD DATA INPATH ‘/export/data/user.txt’ OVERWRITE INTO TABLE video_user_ori;

2.3从原始表查询数据并插入对应的ORC表中
请写出插入语句:
video_orc:
INSERT INTO TABLE video_orc SELECT * FROM video_ori;

video_user_orc:
INSERT INTO TABLE video_user_orc SELECT * FROM video_user_ori;

3、对入库之后的数据进行hivesql查询操作
3.1从视频表中统计出视频评分为5分的视频信息,把查询结果保存到/export/rate.txt
请写出sql语句:
hive -e “select * from testdb.video_ori where rate==5.0” >> /export/rate.txt;
sed -i ‘s/]//g’ rate.txt
sed -i ‘s/[//g’ rate.txt
sed -i ‘s/\”//g’ rate.txt
3.2从视频表中统计出评论数大于100条的视频信息,把查询结果保存到/export/comments.txt
请写出sql语句:conments

hive -e “select * from testdb.video_ori where comments >100” >> /export/comments.txt;
sed -i ‘s/]//g’ comments.txt
sed -i ‘s/[//g’ comments.txt
sed -i ‘s/\”//g’ comments.txt
3.1
hive -e "select * from video.video_orc where rate=5 " > 5.txt
3.2
hive -e "select * from video.video_orc where comments >100 " > 100.txt

4.数据保存到数据库阶段
4.1建表语句
创建rate外部表的语句:
create external table rate(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
row format delimited
fields terminated by “\t”
stored as textfile;

创建comments外部表的语句:
create external table comments(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
row format delimited
fields terminated by “\t”
stored as textfile;
4.2
数据加载语句
load data local inpath ‘/opt/5.txt’ into table rate;
load data local inpath ‘/opt/100.txt’ into table comments;
4.3
创建hive hbase映射表
create table video.hbase_rate(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
stored by ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
with serdeproperties(“hbase.columns.mapping” = “cf:uploader,cf:age,cf:category,cf:length,cf:views,cf:rate,cf:ratings,cf:comments,cf:relatedId”)
tblproperties(“hbase.table.name” = “hbase_rate”);

create table video.hbase_comments(
videoId string,
uploader string,
age string,
category string,
length string,
views string,
rate string,
ratings string,
comments string,
relatedId string)
stored by ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
with serdeproperties(“hbase.columns.mapping” = “cf:uploader,cf:age,cf:category,cf:length,cf:views,cf:rate,cf:ratings,cf:comments,cf:relatedId”)
tblproperties(“hbase.table.name” = “hbase_comments”);

4.4
插入数据
insert into table hbase_rate select * from rate;

insert into table hbase_comments select * from comments;

5.数据的查询显示阶段
5.1
代码结果【截图】

代码

5.2
代码结果【截图】

代码

----->idea内容

public class MyMapper extends Mapper<LongWritable, Text,Text,Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String s = value.toString();
        boolean contains = s.contains(" & ");
        if (contains) {
            String[] split = s.split(" & ");
            StringBuffer ss = new StringBuffer();
            ss.append(split[0]);
            ss.append(",");
            ss.append(split[1]);
            StringBuffer sa = new StringBuffer();
            String[] split1 = ss.toString().split(":");
            for (int i = 0; i < split1.length; i++) {
                if (i < 9) {
                    if (i==split1.length-1){
                        sa.append(split1[i]);
                    }else {
                        sa.append(split1[i]);
                        sa.append(":");
                    }
                } else {
                    if (i==split1.length-1){
                        sa.append(split1[i]);
                    }else {
                        sa.append(split1[i]);
                        sa.append(",");
                    }
                }
            }

            context.write(new Text("ok"),new Text(sa.toString()));
        }
        else {
            StringBuffer sa = new StringBuffer();
            String[] split1 = s.toString().split(":");
            for (int i = 0; i < split1.length; i++) {
                if (i < 9) {
                    sa.append(split1[i]);
                    sa.append(":");
                } else {
                    if (i==split1.length-1){
                        sa.append(split1[i]);
                    }else {
                        sa.append(split1[i]);
                        sa.append(",");
                    }
                }
            }
            context.write(new Text("ok"),new Text(sa.toString()));
        }
    }
}
public class MyReduce extends Reducer<Text,Text,Text, Text> {
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        StringBuffer sb=new StringBuffer();
        for (Text value : values) {
            sb.append(value.toString()+"\r\n");
        }
        try {
            add(sb.toString());
        } catch (URISyntaxException e) {
            e.printStackTrace();
        }

    }
    public void add(String ss) throws URISyntaxException, IOException, InterruptedException {
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://192.168.100.201:8020"), new Configuration(), "root");
        byte[] bytes = ss.getBytes();
        FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path("/Out/value.txt"));
        fsDataOutputStream.write(bytes,0,bytes.length);
        fsDataOutputStream.close();
    }

}

public class MyLoadData {
    public static void main(String[] args) throws Exception {
        //设置链接的服务器
        ConnBean connBean=new ConnBean("node001", "root","123456" );
        //链接服务器
        SSHExec sshExec =SSHExec.getInstance(connBean);
        sshExec.connect();
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://node001:8020"), new Configuration());
        FileStatus[] fileStatuses = fileSystem.listStatus(new Path("/Out"));


        for (FileStatus fileStatus : fileStatuses) {
            Path path=fileStatus.getPath();
            String s="hive -e \"load data inpath '/Out/"+path.getName()+"'into table testdb.video_ori\"";
            //设置执行的命令
//            ExecCommand execCommand=new ExecCommand("hive -e \"load data inpath '/OutPut/"+name+"' into table telecom.networkqualityinfo partition(Year='"+name.replace(".txt","").split("-")[0]+"',Month='"+name.replace(".txt","").split("-")[1]+"',Day='"+name.replace(".txt","").split("-")[2]+"');\"");
            ExecCommand execCommand=new ExecCommand(s);

            //执行命令
            Result exec = sshExec.exec(execCommand);
        }

        //关闭连接
        sshExec.disconnect();
        fileSystem.close();

    }
}

public class Dirver extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        Job job=Job.getInstance(new Configuration(),"fgd");
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        TextInputFormat.addInputPath(job,new Path("F:\\大数据上学期视频代码\\作业\\有关题目\\12月考试预备\\video.txt"));
        TextOutputFormat.setOutputPath(job,new Path("F:\\大数据上学期视频代码\\作业\\有关题目\\12月考试预备\\no"));
        job.setMapperClass(MyMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setReducerClass(MyReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        int i = job.waitForCompletion(true) ? 0 : 1;
        return i;
    }
    public static void main(String[] args) throws Exception {
        int run = ToolRunner.run(new Dirver(), args);
        System.out.println(run);
    }
}

public class text2 {
    public static void main(String[] args) throws IOException {
        //连接数据库
        Configuration conf=new Configuration();
        conf.set("hbase.zookeeper.quorum","node001:2181,node002:2181,node003:2181");
        Connection connection= ConnectionFactory.createConnection(conf);
        //读取表
        Table mytable=connection.getTable(TableName.valueOf("hbase_rate"));
        //全表扫描
        Scan scan=new Scan();
        //区间扫描
        scan.setStartRow("fdrgdfg".getBytes());
        scan.setStopRow("fdgd".getBytes());

        ResultScanner scanner=mytable.getScanner(scan);
        //result 是与一行数据(有多个列族,多个列)
        for (Result result : scanner) {
            System.out.println(Bytes.toString(result.getRow()));
            System.out.println(Bytes.toString(result.getValue("cf".getBytes(),"age".getBytes())));
        }
        //关闭连接
        connection.close();

    }
}

public class text3 {
    //数据的查询
    public static void main(String[] args) throws IOException {
        //连接数据库
        Configuration conf = new Configuration();
        conf.set("hbase.zookeeper.quorum", "node001:2181,node002:2181,node003:2181");
        Connection connection = ConnectionFactory.createConnection(conf);
        //读取表
        Table mytable = connection.getTable(TableName.valueOf("hbase_commer"));
        Scan scan = new Scan();
        //result是一行数据
        ResultScanner scanner = mytable.getScanner(scan);
        for (Result result : scanner) {
            //遍历一行内的所有行
            Cell[] cells = result.rawCells();
            //遍历每个 cell
            for (Cell cell : cells) {
                if (Bytes.toString(CellUtil.cloneQualifier(cell)).equals("comments")) {
                    System.out.println(Bytes.toString(CellUtil.cloneFamily(cell)) + ":" + Bytes.toString(CellUtil.cloneQualifier(cell)) + "-" + Bytes.toString(CellUtil.cloneValue(cell)));
                }
            }
        }
        //关闭连接
        connection.close();
    }
}

判断表是否存在
public class HBASE_API {

private static Configuration conf;

static{
    //使用HbaseConfiguration的单例方法实例化
    conf = HBaseConfiguration.create();

    /**
     * "hbase.zookeeper.quorum" 在hbase的conf下 sist.xml下 hbase的name
     * 使用主机名之前,确保win的host配置文件中配置Linux对应的ip和主机映射
     * */
    conf.set("hbase.zookeeper.quorum", "bigdata111");

    //客户端  端口号
    conf.set("hbase.zookeeper.property.clientPort", "2181");

    //znode节点  在Hadoop的储存地址
    conf.set("zookeeper.znode.parent", "/hbase");
}

/**
 *判断表是否存在
 **/

public static boolean is_table_exist(String table_name) throws IOException {
    //创建Hbase客户端
    Connection connection = ConnectionFactory.createConnection(conf);
    HBaseAdmin admin = (HBaseAdmin) connection.getAdmin();

    //判断是否存在
     return admin.tableExists(Bytes.toBytes(table_name));
}

public static void main(String[] args) throws IOException {
//判断表是否存在
System.out.println(is_table_exist(“aa”));

    //创建表
    createTable("idea1","cf1","cf2","cf3");

    //插入数据
    for (int i = 0; i < 100; i++){
        add_row_data("idea1",String.valueOf(i),"cf","name","wind"+ i);
    }

    //获取某一行指定列数据
    get_row_qualifier("idea1","1","cf","name");

    //获取某一行所有数据
    get_row("idea1","2");

    //获取所有数据
    get_all_rows("idea1");

    //删除多行数据
    delete_multi_row("idea1","1","2","3");

    //删除表
    drop_table("idea1");
}

}

创建表
/**
* 创建表:表名,列簇(可以是多个)
* */

public static void createTable(String table_name, String...columnFamily) throws IOException {
    HBaseAdmin admin = new HBaseAdmin(conf);

    //判断表是否存在
    if(is_table_exist(table_name)){
        System.out.println(table_name+ "已存在");
    }else {
        //创建表属性对象,表名需要转换字节
        HTableDescriptor hTableDescriptor = new HTableDescriptor(TableName.valueOf(table_name));
        //创建列簇
        for(String cf : columnFamily){
            hTableDescriptor.addFamily(new HColumnDescriptor(cf));
        }
        admin.createTable(hTableDescriptor);
        System.out.println(table_name+ "创建成功");
    }

}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
插入数据
/**
* 插入数据:表名,rowkey,列簇,列,value
* */

public static void add_row_data(String table_name, String row, String columnFamily,
                                String column, String value) throws IOException {
    //创建HTable
    HTable hTable = new HTable(conf, table_name);
    //创建put对象
    Put put = new Put(Bytes.toBytes(row));
    //添加列簇,列,数据
    put.add(Bytes.toBytes(columnFamily),Bytes.toBytes(column),Bytes.toBytes(value));
    hTable.put(put);
    hTable.close();
    System.out.println(table_name+ "插入数据成功");
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
获取某一行指定“列族:列”的数据
/**
* 获取某一行指定“列族:列”的数据
* */

public static void get_row_qualifier(String table_name, String row, String family, String qualifier) throws IOException {

    HTable hTable = new HTable(conf, table_name);

    Get get = new Get(Bytes.toBytes(row));
    get.addColumn(Bytes.toBytes(family),Bytes.toBytes(qualifier));
    Result result = hTable.get(get);
    for (Cell cell : result.rawCells()){
        System.out.println("行键"+ Bytes.toString(result.getRow()));
        System.out.println("列簇"+ Bytes.toString(CellUtil.cloneFamily(cell)));
        System.out.println("列"+ Bytes.toString(CellUtil.cloneQualifier(cell)));
        System.out.println("值"+ Bytes.toString(CellUtil.cloneValue(cell)));
    }
    System.out.println(table_name+ "插入数据完成");
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
获取某一行所有数据
/**
*获取某一行所有数据
* */

public static void get_row(String table_name, String row) throws IOException {
    HTable hTable = new HTable(conf, table_name);
    Get get = new Get(Bytes.toBytes(row));
    //显示所有版本
    //get.setMaxVersions();
    //显示指定时间戳的版本
    //get.setTimeStamp();

    Result result = hTable.get(get);
    for (Cell cell : result.rawCells()){
        System.out.println("行键"+ Bytes.toString(result.getRow()));
        System.out.println("列簇"+ Bytes.toString(CellUtil.cloneFamily(cell)));
        System.out.println("列"+ Bytes.toString(CellUtil.cloneQualifier(cell)));
        System.out.println("值"+ Bytes.toString(CellUtil.cloneValue(cell)));
        System.out.println("时间戳"+ cell.getTimestamp());
    }

}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
获取某一行所有数据
/**
* 得到所有数据
* */

public static void get_all_rows(String table_name) throws IOException {
    HTable hTable = new HTable(conf,table_name);
    //得到用于扫描 region 的对象
    Scan scan = new Scan();
    //使用 HTable 得到 ResultScanner 实现类的对象
    ResultScanner scanner = hTable.getScanner(scan);
    for (Result result : scanner){
        Cell[] cells = result.rawCells();
        for (Cell cell : cells){
            System.out.println("行键"+ Bytes.toString(CellUtil.cloneRow(cell)));
            System.out.println("列簇"+ Bytes.toString(CellUtil.cloneFamily(cell)));
            System.out.println("列"+ Bytes.toString(CellUtil.cloneQualifier(cell)));
            System.out.println("值"+ Bytes.toString(CellUtil.cloneValue(cell)));
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
删除多行数据
/**
* 删除多行数据
* */

public static void delete_multi_row(String table_name, String...rows) throws IOException {
    HTable hTable = new HTable(conf, table_name);
    List deletes = new ArrayList();
    for (String row : rows){
        Delete delete = new Delete(Bytes.toBytes(row));
        deletes.add(delete);
        hTable.delete(deletes);
        hTable.close();

    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
删除表
/**
* 删除表
* */

public static void drop_table(String table_name) throws IOException {
    HBaseAdmin admin = new HBaseAdmin(conf);

    //判断表是否存在
    if (is_table_exist(table_name)){
        //停用表
        admin.disableTable(table_name);
        //删除表
        admin.deleteTable(table_name);
        System.out.println(table_name+ "已删除");
    }else {
        System.out.println(table_name+ "已存在");
    }
}

你可能感兴趣的:(Hive)