python数据归一化

在机器学习中,往往需要归一化数据集,下面的公式可以把数据归一化到0~1区间:

newvalue = (oldvalue - min)/(max - min)

python实现的代码如下:

def autoNorm(dataSet):
    minVals = dataSet.min(0) # 取每一列的最小值
    maxVals = dataSet.max(0) # 取每一列的最大值
    ranges = maxVals - minVals
    normDataSet = np.zeros(np.shape(dataSet))
    m = dataSet.shape[0]
    normDataSet = dataSet - np.tile(minVals, (m, 1))
    normDataSet = normDataSet/np.tile(ranges, (m, 1))   
    return normDataSet, ranges, minVals

例子:

import numpy as np

group = np.array([[1, 2], [1, 3], [2, 2], [2, 3]])
newgroup, _, _ = autoNorm(group)
print(newgroup)

# 输出:
[[0. 0.]
 [0. 1.]
 [1. 0.]
 [1. 1.]]

 

你可能感兴趣的:(python,数据集)