spark sql中的first函数在多个字段使用实例

1.建立hive表如下:

CREATE EXTERNAL TABLE `newsapp.test_first`(
  `userkey` string, 
  `publish_id` string, 
  `data_type` string, 
  `soft_version` string, 
  `ua` string, 
  `mos` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'='\t', 
  'serialization.format'='\t') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://ifhoffl/user/hive/warehouse/newsapp.db/test_first'
TBLPROPERTIES (
  'transient_lastDdlTime'='1489126869')

  2.插入数据 

从一个表中查出数据 插入表中 数据是:

867463027018201 2006 2017012420 5.2.4 mi_4lte android_4.4.4
8516c9ebd4073a5281e74ecbf2af6297e159b98d 4002 2017012420 5.4.1 iphone8_1 iphone_10.2
861600033349692 6102 2017012420 5.5.0 oppo_r9m android_5.1
867822021100820 6001 2017012420 5.5.0 hm_note_1s android_4.4.4
99000872898896 6001 2017012420 5.5.0 redmi_note_3 android_6.0.1
860482037942991 6109 2017012420 5.5.0 huawei_nxt-al10 android_7.0
862051031463029 3329 2017012420 5.5.0 huawei_tit-al00 android_5.1
861545034465692 1884 2017012420 5.5.0 le_x620 android_6.0
862095023392928 6103 2017012420 5.5.0 mx4 android_5.1
e04b9d165757e4ef4d3f66dbb02bdfe1f689ad7b 4002 2017012420 5.4.1 iphone8_2 iphone_10.2
869609021094139 3329 2017012420 5.3.0 huawei_tit-al00 android_5.1
861575030737703 6109 2017012420 5.5.0 huawei_tag-al00 android_5.1
860671021446080 6001 2017012420 5.4.1 2013022 android_4.2.2
353254060442122 2011 2017012420 5.5.0 l50u android_4.4.2
d5889858814c200b2d8815a8e5845ad2dab468b5 4002 2017012420 5.4.1 iphone8_1 iphone_10.2
869573021015355 6109 2017012420 5.5.0 huawei_crr-ul00 android_6.0
c9d4113fce3987ff 1994 2017012420 5.4.1 gn5001s android_5.1
864181027315061 6102 2017012420 5.5.0 n5117 android_4.3
d18d7cf2899b5dd98fdadfbeb604cb32dc90a002 4002 2017012420 5.4.1 iphone6_2 iphone_10.2
869511022231761 2024 2017012420 5.4.0 huawei_gra-tl00 android_6.0

插入表中:

insert into table newsapp.test_first
select userkey, publish_id ,date ,soft_version, ua, mos  from newsapp.stats_log where date like '%'  and data_type = 'newsapp' and  action in ( 'page','v','action') limit 20

 

3.在scala_shell中使用fist()函数用于多个字段,也是只选出一条

 

hive (newsapp)> select * from test_first ;
OK
863792031896895	6109	2017021422	5.5.0	cam-al00	android_6.0
863730034740413	6001	2017021422	5.5.0	redmi_note_4	android_6.0
A100004E58DD35	6102	2017021422	5.4.1	oppo_r7sm	android_5.1.1
862534037030736	2002	2017021422	5.5.0	redmi_note_3	android_5.1.1
867628023774987	6109	2017021422	5.5.0	plk-tl01h	android_6.0
863410031072801	6001	2017021422	5.5.0	mi_5s	android_6.0.1
357390061934547	2604	2017021422	5.5.0	sm-n9100	android_6.0.1
867905022664884	6103	2017021422	5.5.0	pro_5	android_5.1
862979030425377	6101	2017021422	5.5.0	vivo_x7	android_5.1.1
A000005E4F58D1	6109	2017021422	5.5.0	scl-cl00	android_5.1.1
868486029009174	6102	2017021422	5.5.0	oppo_a33	android_5.1.1
869158022640854	6109	2017021422	5.4.1	eva-al00	android_6.0
861533036737193	6109	2017021422	5.2.4	eva-al10	android_7.0
E7NDU15A20002276	6109	2017021422	5.3.1	s8-701w	android_4.3
A0000059C57038	6109	2017021422	5.5.0	kiw-cl00	android_5.1.1
352107069757323	2024	2017021422	4.4.6	sm-g9008v	android_4.4.2
A00000597295D2	3329	2017021422	5.4.1	huawei_tag-al00	android_5.1
860777033176332	2011	2017021422	5.4.1	f100	android_5.1
868979029557051	6102	2017021422	5.5.0	oppo_r7	android_4.4.4
866486022346468	3256	2017021422	5.1.0	2014813	android_4.4.4
Time taken: 0.042 seconds, Fetched: 20 row(s)

  

scala>  var df = spark.sql(s"select publish_id, first(userkey) as userkey,first(data_type) as date,first(soft_version) as soft_version,first(ua) as ua,first(mos) as mos  from newsapp.test_first group by publish_id")
df: org.apache.spark.sql.DataFrame = [publish_id: string, userkey: string ... 4 more fields]

scala> df.show()
+----------+---------------+----------+------------+---------------+-------------+
|publish_id|        userkey|      date|soft_version|             ua|          mos|
+----------+---------------+----------+------------+---------------+-------------+
|      6109|863792031896895|2017021422|       5.5.0|       cam-al00|  android_6.0|
|      6101|862979030425377|2017021422|       5.5.0|        vivo_x7|android_5.1.1|
|      2604|357390061934547|2017021422|       5.5.0|       sm-n9100|android_6.0.1|
|      2002|862534037030736|2017021422|       5.5.0|   redmi_note_3|android_5.1.1|
|      3329| A00000597295D2|2017021422|       5.4.1|huawei_tag-al00|  android_5.1|
|      6001|863730034740413|2017021422|       5.5.0|   redmi_note_4|  android_6.0|
|      2011|860777033176332|2017021422|       5.4.1|           f100|  android_5.1|
|      3256|866486022346468|2017021422|       5.1.0|        2014813|android_4.4.4|
|      2024|352107069757323|2017021422|       4.4.6|      sm-g9008v|android_4.4.2|
|      6102| A100004E58DD35|2017021422|       5.4.1|      oppo_r7sm|android_5.1.1|
|      6103|867905022664884|2017021422|       5.5.0|          pro_5|  android_5.1|
+----------+---------------+----------+------------+---------------+-------------+

  还可以group by多个字段

scala>  var df = spark.sql(s"select soft_version, first(userkey) as userkey,first(data_type) as date,first(publish_id) as publish_id,first(ua) as ua,mos  from newsapp.test_first group by soft_version,mos");
df: org.apache.spark.sql.DataFrame = [soft_version: string, userkey: string ... 4 more fields]

scala> df.show()
17/03/16 13:23:11 WARN DFSClient: Slow ReadProcessor read fields took 177140ms (threshold=30000ms); ack: seqno: 131 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 635983, targets: [10.90.11.27:50010, 10.90.11.17:50010, 10.90.11.16:50010]
+------------+----------------+----------+----------+---------------+-------------+
|soft_version|         userkey|      date|publish_id|             ua|          mos|
+------------+----------------+----------+----------+---------------+-------------+
|       5.4.1| 869158022640854|2017021422|      6109|       eva-al00|  android_6.0|
|       5.4.1|  A100004E58DD35|2017021422|      6102|      oppo_r7sm|android_5.1.1|
|       5.2.4| 861533036737193|2017021422|      6109|       eva-al10|  android_7.0|
|       5.5.0| 868979029557051|2017021422|      6102|        oppo_r7|android_4.4.4|
|       5.4.1|  A00000597295D2|2017021422|      3329|huawei_tag-al00|  android_5.1|
|       4.4.6| 352107069757323|2017021422|      2024|      sm-g9008v|android_4.4.2|
|       5.3.1|E7NDU15A20002276|2017021422|      6109|        s8-701w|  android_4.3|
|       5.5.0| 863792031896895|2017021422|      6109|       cam-al00|  android_6.0|
|       5.5.0| 867905022664884|2017021422|      6103|          pro_5|  android_5.1|
|       5.1.0| 866486022346468|2017021422|      3256|        2014813|android_4.4.4|
|       5.5.0| 862534037030736|2017021422|      2002|   redmi_note_3|android_5.1.1|
|       5.5.0| 863410031072801|2017021422|      6001|          mi_5s|android_6.0.1|
+------------+----------------+----------+----------+---------------+-------------+

  

转载于:https://www.cnblogs.com/songjiaying/p/6558841.html

你可能感兴趣的:(spark sql中的first函数在多个字段使用实例)