Table of Contents
归一化
1 Min-max normalization: to [new_minA, new_maxA]
2 z-score normalization(μ: mean, σ: standard deviation):
3 Normalization by decimal scaling
归一化代码示例
数据平滑data smoothing methods
数据离散Data Discretization methods
binning
One-hot Encoding及代码实现
按比例缩小到较小的指定范围内
•min-max normalization
•z-score normalization
•normalization by decimal scaling
eg:收入范围12000~98000,归一化到[0.0,1.0],则73000映射为:
Where j is the smallest integer such that Max(|ν’|) < 1
类似单位换算
from sklearn import preprocessing
import numpy as np
X=np.array([[1.,-1.,2.],
[2.,0.,0.],
[0.,1.,-1.]])
X_scaled=preprocessing.scale(X)
X_scaled
array([[ 0. , -1.22474487, 1.33630621],
[ 1.22474487, 0. , -0.26726124],
[-1.22474487, 1.22474487, -1.06904497]])
用于处理分类变量,用离散方式将特征用二进制方式表示
代码实现
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [['Male', 1], ['Female', 3], ['Female', 2]]
enc.fit(X)
print (X)
# enc.categories_
enc.transform([['Female', 1], ['Male', 4]]).toarray()
[['Male', 1], ['Female', 3], ['Female', 2]]
array([[1., 0., 1., 0., 0.],
[0., 1., 0., 0., 0.]])
相关博文:[数据预处理] onehot编码:是什么,为什么,怎么样