看一下sklearn.preprocessing.scale究竟是怎么算的
from sklearn import preprocessing
import numpy as np
X_train = np.array([[ 1., -1., 2.], [ 2., 0., 0.], [ 0., 1., -1.]]))
X_train
array([[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]])
执行X_train.scale()后,得到
array([[ 0. , -1.22474487, 1.33630621],
[ 1.22474487, 0. , -0.26726124],
[-1.22474487, 1.22474487, -1.06904497]])
X_train[0],第一行的数据
array([ 1., -1., 2.])
执行
preprocessing.scale(X_train[0])
array([ 0.26726124, -1.33630621, 1.06904497])
X_train[:,0], 第一列的数据
array([1., 2., 0.])
执行
preprocessing.scale(X_train[:,0])
array([ 0. , 1.22474487, -1.22474487])
可以看出,preprocessing.scale()是按照列进行标准化计算,计算公式为
(X_train[:,0]-X_train[:,0].mean())/X_train[:,0].std()
(X_train[:,0]-np.mean(X_train[:,0]))/np.std(X_train[:,0])//或者
X_train经过scale后的性质
X_scaled=preprcessing.scale(X_train)
>>> X_scaled.mean(axis=0)
array([0., 0., 0.])
>>> X_scaled.std(axis=0)
array([1., 1., 1.])
使用sklearn.preprocessing.StandardScaler类,使之fit(X_Train),
如:
>>> scaler=preprocessing.StandardScaler().fit(X_train)
>>> scaler.mean_//训练数据的均值
array([1. , 0. , 0.33333333])
>>> sclaer.scale_//训练数据的标准差
array([0.81649658, 0.81649658, 1.24721913])