第四章 使用PolynomialFeatures来构建特征

使用sklearn.preprocessing.PolynomialFeatures来进行特征的构造。

它是使用多项式的方法来进行的,如果有a,b两个特征,那么它的2次多项式为(1,a,b,a^2,ab, b^2)。

PolynomialFeatures有三个参数

degree:控制多项式的度

interaction_only: 默认为False,如果指定为True,那么就不会有特征自己和自己结合的项,上面的二次项中没有a^2和b^2。

include_bias:默认为True。如果为True的话,那么就会有上面的 1那一项。


import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import  Pipeline
path = r"activity_recognizer\1.csv"
#数据在https://archive.ics.uci.edu/ml/datasets/Activity+Recognition+from+Single+Chest-Mounted+Accelerometer
df = pd.read_csv(path, header=None)
df.columns = ['index', 'x', 'y', 'z', 'activity']

knn = KNeighborsClassifier()
knn_params = {'n_neighbors':[3, 4, 5, 6]}
X = df[['x', 'y', 'z']]
y = df['activity']


from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)
X_ploly = poly.fit_transform(X)
X_ploly_df = pd.DataFrame(X_ploly, columns=poly.get_feature_names())
print(X_ploly_df.head())

结果:

       x0      x1      x2       x0^2      x0 x1      x0 x2       x1^2  \
0  1502.0  2215.0  2153.0  2256004.0  3326930.0  3233806.0  4906225.0   
1  1667.0  2072.0  2047.0  2778889.0  3454024.0  3412349.0  4293184.0   
2  1611.0  1957.0  1906.0  2595321.0  3152727.0  3070566.0  3829849.0   
3  1601.0  1939.0  1831.0  2563201.0  3104339.0  2931431.0  3759721.0   
4  1643.0  1965.0  1879.0  2699449.0  3228495.0  3087197.0  3861225.0   

       x1 x2       x2^2  
0  4768895.0  4635409.0  
1  4241384.0  4190209.0  
2  3730042.0  3632836.0  
3  3550309.0  3352561.0  
4  3692235.0  3530641.0  

你可能感兴趣的:(Feature,Engineering,Made,Easy)