官方文档-StandardScaler
standard score(z) of a sample x: z = (x - u) / s
u: the mean of training samples (u = 0 if with_mean = False)
s: the standard deviation of the training samples (s = 1 if with_std = False)
from sklearn.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
print(scaler.fit(data))
output: StandardScaler()
print(scaler.mean_)
print(scaler.var_)
output:
array([0.5, 0.5])
array([0.25, 0.25])
其中scaler.fit(data),即StandardScaler.fit(data)计算出数据的平均值和标准差,并存储在StandardScaler()中便于之后的使用;
调用attributes中的mean_和var_求数据的平均值和方差.
除了fit()之外,StandardScaler()还有许多不同的methods:
注意:
例:
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
# 基于mean和std的标准化
scaler = preprocessing.StandardScaler().fit(train_data)
scaler.transform(train_data)
scaler.transform(test_data)
官方文档-MinMaxScaler
The transform is given by:
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min
where min, max = feature_range. (default: max = 1, min = 0)
Methods:
例:
from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
print(scaler.fit(data))
output:
MinMaxScaler()
print(scaler.data_max_)
output: [ 1. 18.]
print(scaler.transform(data))
output:
[[0. 0. ]
[0.25 0.25]
[0.5 0.5 ]
[1. 1. ]]
例:
import sklearn.preprocessing
X = [[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]]
X_normalized = sklearn.preprocessing.normalize(X, norm='l2')
print(X_normalized)
output:
[[ 0.40824829 -0.40824829 0.81649658]
[ 1. 0. 0. ]
[ 0. 0.70710678 -0.70710678]]
例:
from sklearn.preprocessing import Normalizer
normalizer = Normalizer().fit(X)#fit method is useless in this case
print(normalizer)
output: Normalizer()
print(normalizer.transform(X))
output:
array([[ 0.40824829, -0.40824829, 0.81649658],
[ 1. , 0. , 0. ],
[ 0. , 0.70710678, -0.70710678]])
和直接利用函数normalize结果相同。
Binarize data (set feature values to 0 or 1) according to a threshold
Values greater than the threshold map to 1, while values less than or equal to the threshold map to 0. With the default threshold of 0, only positive values map to 1.
例子:
from sklearn.preprocessing import Binarizer
X = [[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]]
transformer = Binarizer().fit(X) #fit does nothing
print(transformer)
output: Binarizer()
transformer.transform(X)
output:
array([[1., 0., 1.],
[1., 0., 0.],
[0., 1., 0.]])
参考:
几种常用的数据标准化方法
有关transform()和fit_transform