hive计算分位数

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

//参考资料:https://www.cnblogs.com/fujian-code/p/8798409.html
使用percentile_approx函数计算age的分位数

//describe只展示计数、均值、标准差、最小值和最大值, Q1、中位数、Q3需要单独统计
/*
scala> df.select("age").describe().show
+-------+------------------+
|summary|               age|
+-------+------------------+
|  count|              5811|
|   mean|29.463087248322147|
| stddev| 4.775418126339402|
|    min|                18|
|    max|               118|
+-------+------------------+
*/

//创建TempView,再使用spark.sql计算分位数
df.createOrReplaceTempView("trainFeatures")

scala> spark.sql("SELECT min(age) AS Min_age, percentile_approx(age, 0.25) AS Q1_age, percentile_approx(age, 0.5) AS Median_age," +
"percentile_approx(age, 0.75) AS Q3_age, max(age) AS Max_age FROM trainFeatures").show
+-------+------+----------+------+-------+
|Min_age|Q1_age|Median_age|Q3_age|Max_age|
+-------+------+----------+------+-------+
|     18|    26|        29|    32|    118|
+-------+------+----------+------+-------+


转载于:https://my.oschina.net/kyo4321/blog/3050522

你可能感兴趣的:(hive计算分位数)