一个人力资源咨询集团通过网络爬虫采集手段将多个知名招聘网站上发布的求职和招聘等信息准实时采集到自己的库里,形成一个数据量浩大的招聘信息库,跟踪全国招聘和求职的行业、工种、职位、待遇等信息,并通过商业智能系统,开展职业职位供求及趋势等相关统计分析。这家公司竟然用SSAS OLAP Cube多维数据集容纳如此数量级的数据,广告维成员包含了每一个广告条目。该商业智能团队的开发人员咨询如何用MDX求解薪水中位数、四分位数(Median,Quartile)等。
以下是对Median、Q1、Q3等问题的MDX解答:
中位数(median)是对长度为n的系列数据,根据数据大小排列得到的位于[(n+1)/2]位置上的数据。当变量值的项数N为奇数时,处于中间位置的变量值即为中位数;当N为偶数时,中位数则为处于中间位置的2个变量值的平均数,即(M1+M2)/2。中位数是以它在所有标志值中所处的位置确定的全体单位标志值的代表值,不受分布数列的极大或极小值影响,从而在一定程度上提高了中位数对分布数列的代表性。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
//商业智能之路(jbean.org),
Begin
:
WITH
//采集到的招聘广告条目(薪水测量值非空)
Set
RawAdvSet
AS
NonEmpty([Advertisement].[Adv Id].members,[Measures].[Salary
Sum
])
//将广告条目按照薪水排序
Set
AdvSet
AS
Order
(RawAdvSet, [Measures].[Salary
Sum
],
DESC
)
//招聘广告总数
Member [measures].[AdvCount]
as
Count
(AdvSet)
//招聘广告条目中间位置
Member [Measures].[MedianReal]
as
([measures].[AdvCount]-1) * 50 / 100
Member [Measures].[MedianInt]
as
Int
([Measures].[MedianReal])
Member [Measures].[MedianFrac]
as
[Measures].[MedianReal]- [Measures].[MedianInt]
//薪水“中位数”(低)
Member [Measures].[MedianLow]
as
([AdvSet].Item([Measures].[MedianInt]).Item(0),[Measures].[Salary
Sum
])
//薪水“中位数”(高)
Member [Measures].[MedianHigh]
as
([AdvSet].Item([Measures].[MedianInt] + 1).Item(0),[Measures].[Salary
Sum
])
//实际得到薪水的中位数
Member [Measures].[Salary Median]
as
([Measures].[MedianLow] * [Measures].[MedianFrac])
+([Measures].[MedianHigh] * (1 - [Measures].[MedianFrac]))
//商业智能之路(jbean.org),
End
|
四分位数(Quartile)在统计时把所有数值由小到大排列并分成四等份,处于三个分割点位置的得分就是四分位数,分别如下:
第一四分位数 (Q1),又称“较小四分位数”,等于该样本中所有数值由小到大排列后第25%的数字。
第二四分位数 (Q2),又称“中位数”,等于该样本中所有数值由小到大排列后第50%的数字。
第三四分位数 (Q3),又称“较大四分位数”,等于该样本中所有数值由小到大排列后第75%的数字。
第三四分位数与第一四分位数的差距又称四分位距(InterQuartile Range,IQR)。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
|
// 1)求解1Q
//商业智能之路(jbean.org),
Begin
:
WITH
Set
RawAdvSet
AS
NonEmpty([Advertisement].[Adv Id].members,[Measures].[Salary
Sum
])
Set
AdvSet
AS
Order
(RawAdvSet, [Measures].[Salary
Sum
],
DESC
)
Member [Measures].[1QReal]
as
([measures].[AdvCount]-1) * 25 / 100
Member [Measures].[1QInt]
as
Int
([Measures].[1QReal])
Member [Measures].[1QFrac]
as
[Measures].[1QReal]- [Measures].[1QInt]
Member [Measures].[1QLow]
as
([AdvSet].Item([Measures].[1QInt]).Item(0),[Measures].[Salary
Sum
])
Member [Measures].[1QHigh]
as
([AdvSet].Item([Measures].[1QInt] + 1).Item(0),[Measures].[Salary
Sum
])
//实际得到薪水的四分位数quantile(1Q)
Member [Measures].[Salary 1Q]
as
([Measures].[1QLow] * [Measures].[1QFrac])
+([Measures].[1QHigh] * (1 - [Measures].[1QFrac]))
// 2)求解3Q
//商业智能之路(jbean.org),
Begin
:
Member [Measures].[3QReal]
as
([measures].[AdvCount]-1) * 75 / 100
Member [Measures].[3QInt]
as
Int
([Measures].[3QReal])
Member [Measures].[3QFrac]
as
[Measures].[3QReal]- [Measures].[3QInt]
Member [Measures].[3QLow]
as
([AdvSet].Item([Measures].[3QInt]).Item(0),[Measures].[Salary
Sum
])
Member [Measures].[3QHigh]
as
([AdvSet].Item([Measures].[3QInt] + 1).Item(0),[Measures].[Salary
Sum
])
//实际得到薪水的四分位数quantile(3Q)
Member [Measures].[Salary 3Q]
as
([Measures].[3QLow] * [Measures].[3QFrac])
+([Measures].[3QHigh] * (1 - [Measures].[3QFrac]))
//商业智能之路(jbean.org),
End
|