关于特征交叉的作用以及原理,我这里不进行详细描述,因为大佬们已经说得很清楚了,这里就附上几个连接:
特征组合&特征交叉 (Feature Crosses)
结合sklearn进行特征工程
对于特征离散化,特征交叉,连续特征离散化非常经典的解释
下面说怎样制作和交叉特征:
多项式生成函数:
sklearn.preprocessing.PolynomialFeatures(degree=2, interaction_only=False, include_bias=True)
参数说明:
举例说明:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
X = np.arange(6).reshape(3, 2)
X
array([[0, 1],
[2, 3],
[4, 5]])
poly = PolynomialFeatures(degree = 2)
poly.fit_transform(X)
array([[ 1., 0., 1., 0., 0., 1.],
[ 1., 2., 3., 4., 6., 9.],
[ 1., 4., 5., 16., 20., 25.]])
# 设置参数interaction_only = True,不包含单个自变量****n(n>1)特征数据
poly = PolynomialFeatures(degree = 2, interaction_only = True)
poly.fit_transform(X)
array([[ 1., 0., 1., 0.],
[ 1., 2., 3., 6.],
[ 1., 4., 5., 20.]])
# 再添加 设置参数include_bias= False,不包含偏差项数据
poly = PolynomialFeatures(degree = 2, interaction_only = True, include_bias=False)
poly.fit_transform(X)
array([[ 0., 1., 0.],
[ 2., 3., 6.],
[ 4., 5., 20.]])