sklearn.preprocessing.scale

看一下sklearn.preprocessing.scale究竟是怎么算的

from sklearn import preprocessing
import numpy as np
X_train = np.array([[ 1., -1.,  2.], [ 2.,  0.,  0.], [ 0.,  1., -1.]]))

X_train

array([[ 1., -1.,  2.],
       [ 2.,  0.,  0.],
       [ 0.,  1., -1.]])

执行X_train.scale()后,得到

array([[ 0.        , -1.22474487,  1.33630621],
       [ 1.22474487,  0.        , -0.26726124],
       [-1.22474487,  1.22474487, -1.06904497]])

X_train[0],第一行的数据

array([ 1., -1.,  2.])

执行

preprocessing.scale(X_train[0])
array([ 0.26726124, -1.33630621,  1.06904497])

X_train[:,0], 第一列的数据

array([1., 2., 0.])

执行

preprocessing.scale(X_train[:,0])
array([ 0.        ,  1.22474487, -1.22474487])

可以看出,preprocessing.scale()是按照列进行标准化计算,计算公式为

(X_train[:,0]-X_train[:,0].mean())/X_train[:,0].std()
(X_train[:,0]-np.mean(X_train[:,0]))/np.std(X_train[:,0])//或者

X_train经过scale后的性质
X_scaled=preprcessing.scale(X_train)

>>> X_scaled.mean(axis=0)
array([0., 0., 0.])

>>> X_scaled.std(axis=0)
array([1., 1., 1.])

使用sklearn.preprocessing.StandardScaler类,使之fit(X_Train),
如:

>>> scaler=preprocessing.StandardScaler().fit(X_train)
>>> scaler.mean_//训练数据的均值
array([1.        , 0.        , 0.33333333])
>>> sclaer.scale_//训练数据的标准差
array([0.81649658, 0.81649658, 1.24721913])

你可能感兴趣的:(sklearn.preprocessing.scale)