延云 YDB版本v1.0.5-beta版本上线(支持hive与spark查询) 2015-12-28 13:13 阅读(0)

本次新增如下功能:

使用hive来读取ydb的数据进行分析。

使用spark来读取ydb的数据进行分析。

通过编程来导出到其他系统中的接口。

Mapreduce- InputFormat接口。


YDB下载地址:

您必须同意授权使用协议才允许使用该软件  授权协议下载


当前版本v1.0.5-beta

获取延云YDB

360云盘获取:http://yunpan.cn/cuHD72ifTWtz2 提取码: 5928



使用hive来读取ydb的数据进行分析

通过ydbhive的数据对接,可以利用hive或者sparkydb的功能进行拓展,实现(如多表关联,中位数,SQL嵌套等)复杂的查询。

添加依赖的jar

add jar /data/xxx.xxx.xx/ydb-x.x.x-pg.jar ;

ydb的表与hive的表进行映射

注意两点

1.仅映射必须的字段,无用的字段尽量别映射。

2.映射使用where过滤条件将映射的记录条数限制的越少越好。

通过上述两点,可以减少ydb传递给hive的数据量,ydb本身的磁盘IO也会变小,故可提高效率。

 

 

映射示例一:

CREATE external  TABLE ydbhive_example (

tradetime string,tradenum string,tradeid string,nickname string,cardnum string

)    

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example",

 "ydb.handler.sql"=" select tradetime,tradenum,tradeid,nickname,cardnum from ydb_example_trade where ydbpartion='20151011'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100'  limit 0,10"

);

 

映射示例二:

CREATE external  TABLE ydbhive_example_bigdata (

phonenum string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string

)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",

 "ydb.handler.sql"=" select phonenum,ydb_sex,ydb_province,ydb_grade,ydb_age from  ydb_example_ads where ydbpartion='20151111'  and (ydb_grade='博士') and ydb_sex='女' and ydb_province='北京' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:1000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10"

);

 

映射示例三:

为了节省IO,部分查询可以在ydb端做预聚合,减轻hivespark的压力

CREATE external  TABLE ydbhive_example_groupby (province string, bank  string, amt double,cnt double)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (

"ydb.handler.hostport"="101.200.130.48:8080",

"ydb.handler.sql.key"="ydb.sql.ydbhive_example_groupby",

 ' ydb.handler.sql'=" select province,bank,sum(amt),count(*) from ydb_example_trade where ydbpartion='20151011'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100' group by province,bank limit 0,10"

);

 

查询示例;

示例一:

select * from ydbhive_example limit 10;

select count(*) from ydbhive_example limit 10;

select tradeid,count(*) from ydbhive_example group by tradeid limit 10;

 

示例二:

select ydb_sex,ydb_province,ydb_grade,ydb_age,count(*) as cnt from ydbhive_example_bigdata group by ydb_sex,ydb_province,ydb_grade,ydb_age order by cnt desc limit 100

 

select count(*) from ydbhive_example_bigdata limit 10

 

示例三:

在hive端对ydb的聚合结果做进一步的查询

select * from ydbhive_example_groupby  limit 10

 

select province,bank,sum(amt),sum(cnt) as cnt from ydbhive_example_groupby group by province,bank  order by cnt desc limit 100

 

查询过程中动态改变表的映射

设置的属性的名字由,创建表时候配置的ydb.handler.sql.key的值指定

示例一

set  ydb.sql.ydbhive_example_bigdata=" select phonenum,ydb_sex,ydb_province,ydb_grade,ydb_age from  ydb_example_ads where ydbpartion='20151111'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:1000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10";

select count(*) from ydbhive_example_bigdata limit 10

 

 

示例二

set  ydb.sql.ydbhive_example_groupby =" select province,bank,sum(amt),count(*) from ydb_example_trade where ydbpartion='20151011'  and province='辽宁省'  and ydbkv='export.joinchar:%01'  and ydbkv='export.max.return.docset.size:1000000' and ydbkv='max.return.docset.size:100000000'  group by province,bank limit 0,10";

select province,bank,sum(amt),sum(cnt) as cnt from ydbhive_example_groupby group by province,bank  order by cnt desc limit 100

 

不需要的列可以使用higoempty_ex{N}_s占位,用以节省IO

set  ydb.sql.ydbhive_example_bigdata=" select higoempty_ex1_s,ydb_sex, higoempty_ex2_s, higoempty_ex3_s, higoempty_ex4_s from  ydb_example_ads where ydbpartion='20151111'  and ydb_province='北京' and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:1000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10";

select ydb_sex ,count(*) as cnt from ydbhive_example_bigdata group by ydb_sex order by cnt desc limit 100

 

select * from ydbhive_example_bigdata limit 10

 

使用spark来读取ydb的数据进行分析

spark操作ydb几乎与hive完全一样。

 

但由于spark不支持add jar 方法,记得配置 SPARK_CLASSPATH

 

示例如下:

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/data/ycloud/ycloud/ydb/lib/ydb-1.0.5-pg.jar

 

 

 

导出到其他系统中的接口

编程示例如下:

 

                   String master="101.200.130.48:8080"

        String exportSql=" select tradetime,tradenum,tradeid,nickname,cardnum from ydb_example_trade where ydbpartion='20151011' and ydbkv='export.joinchar:%09' and ydbkv='export.max.return.docset.size:30' limit 0,10 ";

 

        HiveYdbTableInputFormat format=new HiveYdbTableInputFormat();

           YdbInputSplit[] list=format.getSplits(master, args[1], "");

          

           System.out.println(Arrays.toString(list));

           //这个步骤仅仅为了随机

           HashMap<Integer, YdbInputSplit> randomMap=new HashMap<Integer, YdbInputSplit>();

           for(YdbInputSplit split:list)

           {

                 randomMap.put((int) (Math.random()*1000000), split);

           }

          

           //这里可以考虑多线程,并发导出

           for(Entry<Integer, YdbInputSplit> e:randomMap.entrySet())

           {

                 YdbInputSplit split=e.getValue();

                 System.out.println("#######################");

                 System.out.println(split.toString());

                

                 YdbRecordReader reader =new YdbRecordReader(split);

                 LongWritable key=new LongWritable();

                 BytesWritable wr=new BytesWritable();

                 while(reader.next(key, wr))

                 {

                                 System.out.println(reader.getProgress()+"\t"+reader.getPos()+"\t"+reader.getTotal()+"\t"+new String(wr.getBytes(),0,wr.getLength(),"utf-8"));

                 }

                 reader.close();

           }

 

 

注:的exportSql只能是select 不能是其他统计的SQL

 

Mapreduce- InputFormat接口

使用示例如下:

 

String master="101.200.130.48:8080"

        String exportSql=" select tradetime,tradenum,tradeid,nickname,cardnum from ydb_example_trade where ydbpartion='20151011' and ydbkv='export.joinchar:%09' and ydbkv='export.max.return.docset.size:30' limit 0,10 ";

 

 

 

job.setInputFormatClass(HiveYdbTableInputFormat.class);

HiveYdbTableInputFormat.setYdb(job. getConfiguration(),master, exportSql);

 

注:的exportSql只能是select 不能是其他统计的SQL

你可能感兴趣的:(延云 YDB版本v1.0.5-beta版本上线(支持hive与spark查询) 2015-12-28 13:13 阅读(0))