Hive Sql练习

  • SQL1:

domain time traffic(T)
gifshow.com 2019/01/01 5
yy.com 2019/01/01 4
huya.com 2019/01/01 1
gifshow.com 2019/01/20 6
gifshow.com 2019/02/01 8
yy.com 2019/01/20 5
gifshow.com 2019/02/02 7
需求:统计每个用户的累计访问量 一个SQL搞定
domain month traffics totals
gifshow.com 2019-01 11 11
gifshow.com 2019-02 15 26
yy.com 2019-01 9 9
huya.com 2019-01 1 1

SELECT t2.domain,
       t2.month,
       t2.traffics,
       ( t2.traffics + t2.lag1 ) AS totals
FROM   (SELECT t1.domain,
               t1.month,
               t1.traffics,
               Lag(traffics, 1, 0)
                 OVER (
                   partition BY domain
                   ORDER BY month) lag1
        FROM   (SELECT t.domain,
                       t.month,
                       Sum(t.traffic) traffics
                FROM   (SELECT domain,
                               Substr(From_unixtime(Unix_timestamp(time, 'yyyy/MM/dd'), 'yyyy-MM-dd'), 1, 7) month,
                               traffic
                        FROM   traffic) t
                GROUP  BY t.month,
                          t.domain) t1) t2;
  • SQL2:

uid pid
user1 a
user2 b
.......
1)uv ==> uid cnt
2)统计每个产品top3的用户信息 ==> pid uid cnt

SELECT uid,
       Count(DISTINCT pid)
FROM   user_product
GROUP  BY uid;


SELECT t1.pid,
       t1.uid,
       t1.cnt
FROM   (SELECT t.pid,
               t.uid,
               t.cnt,
               Row_number()
                 OVER (
                   partition BY pid
                   ORDER BY cnt DESC) rank
        FROM   (SELECT pid,
                       uid,
                       Count(*) cnt
                FROM   user_product
                GROUP  BY pid,
                          uid) t) t1
WHERE  t1.rank <= 3;

你可能感兴趣的:(Hive Sql练习)