特征工程:特征预处理(无量纲化处理)

文章目录

  • 一、瞎解释
  • 二、归一化
  • 三、标准化★

一、瞎解释

特征预处理API

sklearn.preprocessing

为什么要做归一化/标准化?
   无量纲化
   特征的单位或者数量相差较大,这样某特征会‘绝对’最终结果,使得其他算法无法学习到其他特征。

二、归一化

将原始数据进行变换将数据映射到[0,1]之间(默认)

公式:
特征工程:特征预处理(无量纲化处理)_第1张图片

特征工程:特征预处理(无量纲化处理)_第2张图片

特征工程:特征预处理(无量纲化处理)_第3张图片
我们可以使用sklearn库中的MinMaxScaler(feature_range(0,1)):进行数据处理

案例:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler


def minmax_demo():
    """
    归一化
    :return:
    """
    # 1、获取数据
    data = pd.read_csv('test00.csv')
    # 只要前三列数据
    data = data.iloc[:, :3]
    print("data:\n", data)
    # 2、实例化一个转换器类
    transfer = MinMaxScaler()
    # 3、调用fit_transform()
    data_new = transfer.fit_transform(data)
    print("data_new:\n", data_new)
    return None

if __name__ == '__main__':
    minmax_demo()

最终转换结果都在 0-1 区间内

data:
     height  weight  chest measurement
0      180      70            0.88877
1      190      80            0.99665
2      168      60            0.65878
3      159      65            0.65598
4      169      56            0.55658
5      173      60            0.46058
6      186      76            0.69978
7      178      60            0.64979
8      175      75            0.89895
9      176      60            0.88488
10     177      90            0.79595
11     168     100            0.48789
12     158     102            0.55646
13     168      60            0.69585
14     179      80            0.65785
15     183      70            0.69578
16     190      66            0.89586
17     196      88            0.96527
18     187      91            0.62488
19     182      90            0.58484
20     158      70            0.58947
21     159      55            0.58484
22     166      55            0.59896
23     178      54            0.48487
24     163      69            0.68745
25     156      55            0.52621
26     189      89            0.66959
27     156      56            0.59595
28     189      98            0.59716
29     169      66            0.65479
30     179      55            0.99598
31     177      68            0.55257
32     166      76            0.69784
33     169      86            0.68745
34     189      89            0.69988
35     188      68            0.78955
36     176      59            0.55999
37     177      60            0.68747
38     196      80            0.64888
data_new:
 [[0.6        0.33333333 0.79875762]
 [0.85       0.54166667 1.        ]
 [0.3        0.125      0.36972783]
 [0.075      0.22916667 0.36450464]
 [0.325      0.04166667 0.17908109]
 [0.425      0.125      0.        ]
 [0.75       0.45833333 0.44621038]
 [0.55       0.125      0.35295764]
 [0.475      0.4375     0.81774768]
 [0.5        0.125      0.79150111]
 [0.525      0.75       0.6256086 ]
 [0.3        0.95833333 0.05094484]
 [0.05       1.         0.17885724]
 [0.3        0.125      0.43887925]
 [0.575      0.54166667 0.36799299]
 [0.675      0.33333333 0.43874867]
 [0.85       0.25       0.81198351]
 [1.         0.70833333 0.94146287]
 [0.775      0.77083333 0.30648982]
 [0.65       0.75       0.23179809]
 [0.05       0.33333333 0.24043502]
 [0.075      0.02083333 0.23179809]
 [0.25       0.02083333 0.25813793]
 [0.55       0.         0.04531125]
 [0.175      0.3125     0.42320966]
 [0.         0.02083333 0.12242804]
 [0.825      0.72916667 0.38989311]
 [0.         0.04166667 0.25252299]
 [0.825      0.91666667 0.25478016]
 [0.325      0.25       0.36228478]
 [0.575      0.02083333 0.99875016]
 [0.525      0.29166667 0.17160072]
 [0.25       0.45833333 0.44259145]
 [0.325      0.66666667 0.42320966]
 [0.825      0.72916667 0.44639693]
 [0.8        0.29166667 0.61366986]
 [0.5        0.10416667 0.1854422 ]
 [0.525      0.125      0.42324696]
 [1.         0.54166667 0.3512601 ]]

Process finished with exit code 0

归一化缺点:如果最大值和最小值是异常值,则对结果影响很大

三、标准化★

通过对原始数据进行变换,把数据变换到均值为0,标准差为1的范围内
特征工程:特征预处理(无量纲化处理)_第4张图片
对于标准化而言,如果出现异常值,则对最终结果的影响也不是很大

使用sklearn中的API—StandardScaler()
案例:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler


def minmax_demo():
    """
    归一化
    :return:
    """
    # 1、获取数据
    data = pd.read_csv('test00.csv')
    # 只要前三列数据
    data = data.iloc[:, :3]
    print("data:\n", data)
    # 2、实例化一个转换器类
    # transfer = MinMaxScaler()
    transfer = StandardScaler()
    # 3、调用fit_transform()
    data_new = transfer.fit_transform(data)
    print("data_new:\n", data_new)
    return None


if __name__ == '__main__':
    minmax_demo()
data:
     height  weight  chest measurement
0      180      70            0.88877
1      190      80            0.99665
2      168      60            0.65878
3      159      65            0.65598
4      169      56            0.55658
5      173      60            0.46058
6      186      76            0.69978
7      178      60            0.64979
8      175      75            0.89895
9      176      60            0.88488
10     177      90            0.79595
11     168     100            0.48789
12     158     102            0.55646
13     168      60            0.69585
14     179      80            0.65785
15     183      70            0.69578
16     190      66            0.89586
17     196      88            0.96527
18     187      91            0.62488
19     182      90            0.58484
20     158      70            0.58947
21     159      55            0.58484
22     166      55            0.59896
23     178      54            0.48487
24     163      69            0.68745
25     156      55            0.52621
26     189      89            0.66959
27     156      56            0.59595
28     189      98            0.59716
29     169      66            0.65479
30     179      55            0.99598
31     177      68            0.55257
32     166      76            0.69784
33     169      86            0.68745
34     189      89            0.69988
35     188      68            0.78955
36     176      59            0.55999
37     177      60            0.68747
38     196      80            0.64888
data_new:
 [[ 0.40612393 -0.13933189  1.4864856 ]
 [ 1.29594603  0.56637508  2.26419106]
 [-0.66166258 -0.84503885 -0.17150918]
 [-1.46250247 -0.49218537 -0.19169434]
 [-0.57268037 -1.12732164 -0.90826759]
 [-0.21675154 -0.84503885 -1.60033029]
 [ 0.94001719  0.28409229  0.12405926]
 [ 0.22815951 -0.84503885 -0.23631797]
 [-0.03878712  0.21352159  1.55987308]
 [ 0.05019509 -0.84503885  1.45844265]
 [ 0.1391773   1.27208204  0.81734748]
 [-0.66166258  1.977789   -1.40345287]
 [-1.55148468  2.11893039 -0.90913267]
 [-0.66166258 -0.84503885  0.09572795]
 [ 0.31714172  0.56637508 -0.17821354]
 [ 0.67307056 -0.13933189  0.09522332]
 [ 1.29594603 -0.42161467  1.53759732]
 [ 1.82983928  1.13094065  2.03797306]
 [ 1.0289994   1.34265273 -0.41589382]
 [ 0.58408835  1.27208204 -0.70454163]
 [-1.55148468 -0.13933189 -0.67116403]
 [-1.46250247 -1.19789233 -0.70454163]
 [-0.839627   -1.19789233 -0.60275075]
 [ 0.22815951 -1.26846303 -1.42522401]
 [-1.10657363 -0.20990258  0.03517246]
 [-1.7294491  -1.19789233 -1.12720451]
 [ 1.20696382  1.20151134 -0.09358004]
 [-1.7294491  -1.12732164 -0.6244498 ]
 [ 1.20696382  1.83664761 -0.61572692]
 [-0.57268037 -0.42161467 -0.20027304]
 [ 0.31714172 -1.19789233  2.25936103]
 [ 0.1391773  -0.28047328 -0.93717563]
 [-0.839627    0.28409229  0.11007383]
 [-0.57268037  0.98979925  0.03517246]
 [ 1.20696382  1.20151134  0.12478016]
 [ 1.11798161 -0.28047328  0.77120997]
 [ 0.05019509 -0.91560955 -0.88368495]
 [ 0.1391773  -0.84503885  0.03531664]
 [ 1.82983928  0.56637508 -0.24287815]]

你可能感兴趣的:(人工智能+大数据,sklearn,机器学习,python)