任务 1:全国各省(省、自治区、直辖市、特别行政区等省级行政
区)微博注册用户的比例。
说明:“比例”均使用百分数,精确到小数点后 2 位,如:23.45%
select province, count(*) from `user` group by province;
select count(*) from `user`;
说明:下边语句中的 round(d, n)是将 double 类型的 d 数字,保留
n 位小数位
select p.province, round(p.puser / u.totaluser, 4) from
(select count(*) as puser from `user` group by province) p,
(select count(*) as totaluser from `user`) u;
最终语句:
select t2.provincename, concat(round(t1.rate * 100, 2),'%')
from (select p.province, p.puser / u.totaluser as rate
from (select province, count(*) as puser from `user` group by province) p,
(select count(*) as totaluser from `user`) u order by rate desc) t1
join provinces t2 on t1.province=t2.province;
任务 2:喜欢在夜晚(20:00-24:00)发微博的用户比例。
说明:如果一个用户在夜晚发微博的数目大于在其他时间的发微博
数目,则定义他/她喜欢在夜晚发微博。
总微薄用户人数:
select count(distinct uid) from weekly;
select count(*) from weekly group by uid,
(hour(created_at) >=20 or hour(created_at) <= 24) as h;
select count(*) from weekly where hour(created_at) >=20 or
hour(created_at) <= 24;
select a.cnt / b.cnt from (select count(*) as cnt from
weekly where hour(created_at) >=20 or hour(created_at) <=
24) a, (select count(*) as cnt from weekly) b;
任务 3:大 V 用户的男女性别比例。
说明:V 是“Verified”的简称,即“userdata.csv”文件中的
verified 字段
select gender, count(*) from `user` where verified=true
group by verified, gender;
select a.cnt from (select gender, count(*) as cnt from
`user` where verified=true group by verified, gender) a
where a.gender='f' ;
select a.cnt from (select gender, count(*) as cnt from
`user` where verified=true group by verified, gender) a
where a.gender='m' ;
select count(*) from `user` where verified=true;
select round(f.cnt / c.cnt, 4), round(m.cnt / c.cnt, 4)
from
(select a.cnt as cnt from (select gender, count(*) as cnt
from `user` where verified=true group by verified,
gender) a
where a.gender='f') f,
(select a.cnt as cnt from (select gender, count(*) as cnt
from `user` where verified=true group by verified, gender)
a
where a.gender='m') m,
(select count(*) as cnt from `user` where verified=true) c;
任务 4:最受关注的前 5 名用户 ID。
说明:总转发数大的用户,定义为更受关注的用户。
select retweeted_uid, count(*) as cnt from weekly where
retweeted_uid !='' group by retweeted_uid order by cnt
desc limit 5;