numpy自带一些函数接口,可以用来很方便的计算一组数据的均值(mean),方差(variance)和标准差(standard deviation)。
均值(mean)
>>> a = np.array([1,2,3,4,5,6,7,8,9])
>>> np.mean(a)
5.0
除了np.mean函数,还有np.average函数也可以用来计算mean,不一样的地方时,np.average函数可以带一个weights参数:
>>> np.average(a)
5.0
>>> np.average(a, weights=(1,1,1,1,1,1,1,1,1))
5.0
>>> np.average(a, weights=(1,1,1,1,1,1,1,6,1))
6.071428571428571
mean函数有axis参数可以使用:
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> a.shape
(4, 5)
>>> np.mean(a, axis=0)
array([ 7.5, 8.5, 9.5, 10.5, 11.5])
>>> np.mean(a, axis=0).shape
(5,)
>>> np.mean(a, axis=1)
array([ 2., 7., 12., 17.])
>>> np.mean(a, axis=1).shape
(4,)
>>> np.mean(a, axis=(0,1))
9.5
>>> np.mean(a)
9.5
方差(variance)
>>> np.var(a)
6.666666666666667
>>> np.var(a, ddof=1)
7.5
np.var函数计算方差。注意ddof参数,默认情况下,np.var函数计算方差时,是除以n=len(a),此时ddof=0。我们都知道用样本方差来估计总体方差的计算公式是除以n-1,此时ddof=1。
下面是自己算的方差,给使用np.var信心:
>>> tss = 0
>>> for i in range(len(a)):
... tss += (a[i]-np.mean(a))**2
...
>>> tss
60.0
>>> tss/(len(a)-1)
7.5
>>> tss/(len(a))
6.666666666666667
标准差(standard deviation)
>>> np.sqrt(np.var(a))
2.581988897471611
>>> np.sqrt(np.var(a))**2
6.666666666666666
>>>
>>> np.sqrt(np.var(a, ddof=1))
2.7386127875258306
>>> np.sqrt(np.var(a, ddof=1))**2
7.5
函数np.sqrt用来开根号!
除了np.sqrt外,还有一个专门的std函数,用来计算标准方差:
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> np.std(a)
5.766281297335398
>>> np.sqrt(np.var(a))
5.766281297335398
>>> np.std(a, ddof=1)
5.916079783099616
>>> np.sqrt(np.var(a, ddof=1))
5.916079783099616
np.std
-- EOF --