1.建立hive表如下:
CREATE EXTERNAL TABLE `newsapp.test_first`( `userkey` string, `publish_id` string, `data_type` string, `soft_version` string, `ua` string, `mos` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'field.delim'='\t', 'serialization.format'='\t') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://ifhoffl/user/hive/warehouse/newsapp.db/test_first' TBLPROPERTIES ( 'transient_lastDdlTime'='1489126869')
2.插入数据
从一个表中查出数据 插入表中 数据是:
867463027018201 2006 2017012420 5.2.4 mi_4lte android_4.4.4
8516c9ebd4073a5281e74ecbf2af6297e159b98d 4002 2017012420 5.4.1 iphone8_1 iphone_10.2
861600033349692 6102 2017012420 5.5.0 oppo_r9m android_5.1
867822021100820 6001 2017012420 5.5.0 hm_note_1s android_4.4.4
99000872898896 6001 2017012420 5.5.0 redmi_note_3 android_6.0.1
860482037942991 6109 2017012420 5.5.0 huawei_nxt-al10 android_7.0
862051031463029 3329 2017012420 5.5.0 huawei_tit-al00 android_5.1
861545034465692 1884 2017012420 5.5.0 le_x620 android_6.0
862095023392928 6103 2017012420 5.5.0 mx4 android_5.1
e04b9d165757e4ef4d3f66dbb02bdfe1f689ad7b 4002 2017012420 5.4.1 iphone8_2 iphone_10.2
869609021094139 3329 2017012420 5.3.0 huawei_tit-al00 android_5.1
861575030737703 6109 2017012420 5.5.0 huawei_tag-al00 android_5.1
860671021446080 6001 2017012420 5.4.1 2013022 android_4.2.2
353254060442122 2011 2017012420 5.5.0 l50u android_4.4.2
d5889858814c200b2d8815a8e5845ad2dab468b5 4002 2017012420 5.4.1 iphone8_1 iphone_10.2
869573021015355 6109 2017012420 5.5.0 huawei_crr-ul00 android_6.0
c9d4113fce3987ff 1994 2017012420 5.4.1 gn5001s android_5.1
864181027315061 6102 2017012420 5.5.0 n5117 android_4.3
d18d7cf2899b5dd98fdadfbeb604cb32dc90a002 4002 2017012420 5.4.1 iphone6_2 iphone_10.2
869511022231761 2024 2017012420 5.4.0 huawei_gra-tl00 android_6.0
插入表中:
insert into table newsapp.test_first select userkey, publish_id ,date ,soft_version, ua, mos from newsapp.stats_log where date like '%' and data_type = 'newsapp' and action in ( 'page','v','action') limit 20
3.在scala_shell中使用fist()函数用于多个字段,也是只选出一条
hive (newsapp)> select * from test_first ; OK 863792031896895 6109 2017021422 5.5.0 cam-al00 android_6.0 863730034740413 6001 2017021422 5.5.0 redmi_note_4 android_6.0 A100004E58DD35 6102 2017021422 5.4.1 oppo_r7sm android_5.1.1 862534037030736 2002 2017021422 5.5.0 redmi_note_3 android_5.1.1 867628023774987 6109 2017021422 5.5.0 plk-tl01h android_6.0 863410031072801 6001 2017021422 5.5.0 mi_5s android_6.0.1 357390061934547 2604 2017021422 5.5.0 sm-n9100 android_6.0.1 867905022664884 6103 2017021422 5.5.0 pro_5 android_5.1 862979030425377 6101 2017021422 5.5.0 vivo_x7 android_5.1.1 A000005E4F58D1 6109 2017021422 5.5.0 scl-cl00 android_5.1.1 868486029009174 6102 2017021422 5.5.0 oppo_a33 android_5.1.1 869158022640854 6109 2017021422 5.4.1 eva-al00 android_6.0 861533036737193 6109 2017021422 5.2.4 eva-al10 android_7.0 E7NDU15A20002276 6109 2017021422 5.3.1 s8-701w android_4.3 A0000059C57038 6109 2017021422 5.5.0 kiw-cl00 android_5.1.1 352107069757323 2024 2017021422 4.4.6 sm-g9008v android_4.4.2 A00000597295D2 3329 2017021422 5.4.1 huawei_tag-al00 android_5.1 860777033176332 2011 2017021422 5.4.1 f100 android_5.1 868979029557051 6102 2017021422 5.5.0 oppo_r7 android_4.4.4 866486022346468 3256 2017021422 5.1.0 2014813 android_4.4.4 Time taken: 0.042 seconds, Fetched: 20 row(s)
scala> var df = spark.sql(s"select publish_id, first(userkey) as userkey,first(data_type) as date,first(soft_version) as soft_version,first(ua) as ua,first(mos) as mos from newsapp.test_first group by publish_id")
df: org.apache.spark.sql.DataFrame = [publish_id: string, userkey: string ... 4 more fields]
scala> df.show()
+----------+---------------+----------+------------+---------------+-------------+
|publish_id| userkey| date|soft_version| ua| mos|
+----------+---------------+----------+------------+---------------+-------------+
| 6109|863792031896895|2017021422| 5.5.0| cam-al00| android_6.0|
| 6101|862979030425377|2017021422| 5.5.0| vivo_x7|android_5.1.1|
| 2604|357390061934547|2017021422| 5.5.0| sm-n9100|android_6.0.1|
| 2002|862534037030736|2017021422| 5.5.0| redmi_note_3|android_5.1.1|
| 3329| A00000597295D2|2017021422| 5.4.1|huawei_tag-al00| android_5.1|
| 6001|863730034740413|2017021422| 5.5.0| redmi_note_4| android_6.0|
| 2011|860777033176332|2017021422| 5.4.1| f100| android_5.1|
| 3256|866486022346468|2017021422| 5.1.0| 2014813|android_4.4.4|
| 2024|352107069757323|2017021422| 4.4.6| sm-g9008v|android_4.4.2|
| 6102| A100004E58DD35|2017021422| 5.4.1| oppo_r7sm|android_5.1.1|
| 6103|867905022664884|2017021422| 5.5.0| pro_5| android_5.1|
+----------+---------------+----------+------------+---------------+-------------+
还可以group by多个字段
scala> var df = spark.sql(s"select soft_version, first(userkey) as userkey,first(data_type) as date,first(publish_id) as publish_id,first(ua) as ua,mos from newsapp.test_first group by soft_version,mos"); df: org.apache.spark.sql.DataFrame = [soft_version: string, userkey: string ... 4 more fields] scala> df.show() 17/03/16 13:23:11 WARN DFSClient: Slow ReadProcessor read fields took 177140ms (threshold=30000ms); ack: seqno: 131 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 635983, targets: [10.90.11.27:50010, 10.90.11.17:50010, 10.90.11.16:50010] +------------+----------------+----------+----------+---------------+-------------+ |soft_version| userkey| date|publish_id| ua| mos| +------------+----------------+----------+----------+---------------+-------------+ | 5.4.1| 869158022640854|2017021422| 6109| eva-al00| android_6.0| | 5.4.1| A100004E58DD35|2017021422| 6102| oppo_r7sm|android_5.1.1| | 5.2.4| 861533036737193|2017021422| 6109| eva-al10| android_7.0| | 5.5.0| 868979029557051|2017021422| 6102| oppo_r7|android_4.4.4| | 5.4.1| A00000597295D2|2017021422| 3329|huawei_tag-al00| android_5.1| | 4.4.6| 352107069757323|2017021422| 2024| sm-g9008v|android_4.4.2| | 5.3.1|E7NDU15A20002276|2017021422| 6109| s8-701w| android_4.3| | 5.5.0| 863792031896895|2017021422| 6109| cam-al00| android_6.0| | 5.5.0| 867905022664884|2017021422| 6103| pro_5| android_5.1| | 5.1.0| 866486022346468|2017021422| 3256| 2014813|android_4.4.4| | 5.5.0| 862534037030736|2017021422| 2002| redmi_note_3|android_5.1.1| | 5.5.0| 863410031072801|2017021422| 6001| mi_5s|android_6.0.1| +------------+----------------+----------+----------+---------------+-------------+