专门产生多项式的,并且多项式包含的是相互影响的特征集。比如:一个输入样本是2维的。形式如[a,b] ,则二阶多项式的特征集如下[1,a,b,a^2, ab,b^2]
参数:
degree : integer
The degree of the polynomial features. Default = 2.
多项式的阶数,一般默认是2
interaction_only : boolean, default = False
If true, only interaction features are produced: features that are products of at
most degree distinct input features (so not x[1] ** 2, x[0] * x[2] ** 3, etc.).
如果值为true(默认是false),则会产生相互影响的特征集
include_bias : boolean
If True (default), then include a bias column, the feature in which all polynomial
powers are zero (i.e. a column of ones - acts as an intercept term in a linear
model).
是否包含偏差列
方法:
Methods:
1.fit(X, y=None)
Compute number of output features.
2.fit_transform(X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns
a transformed version of X.
参数:
X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
Returns:
X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
输入参数:输入特征矩阵
返回:输出特征矩阵
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X[, y]) Transform data to polynomial features
>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
[2, 3],
[4, 5]])
>>> poly = PolynomialFeatures(2) #设置多项式阶数为2,其他的默认
>>> poly.fit_transform(X)
array([[ 1, 0, 1, 0, 0, 1],
[ 1, 2, 3, 4, 6, 9],
[ 1, 4, 5, 16, 20, 25]])
>>> poly = PolynomialFeatures(interaction_only=True)#默认的阶数是2,
同时设置交互关系为true
>>> poly.fit_transform(X)
array([[ 1, 0, 1, 0],
[ 1, 2, 3, 6],
[ 1, 4, 5, 20]])
备注:上面的数组中,每一行是一个list。比如[0,1] 类似与上面的[a,b]。好的现在它的多项式输出矩阵就是
[1,a,b,a^2,ab,b^2]。所以就是下面对应的[1,0,1,0,0,1]。
现在将interaction_only=True。这时就是只找交互作用的多项式输出矩阵。例如[a,b]的多项式交互式输出
[1,a,b,ab]。不存在自己与自己交互的情况如;a^2或者a*b^2之类的
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
def f(x):
""" function to approximate by polynomial interpolation"""
return x * np.sin(x)
# generate points used to plot
x_plot = np.linspace(0, 10, 100)#在0-10之间均匀的取100个数
print 'x_plot',x_plot,x_plot.shape
# generate points and keep a subset of them
x = np.linspace(0, 10, 100)
rng = np.random.RandomState(0)#产生一个伪随机数
rng.shuffle(x)#现场修改序列,改变自身内容。(类似洗牌,打乱顺序)
x = np.sort(x[:20])#产生一个子集,只去前20个数.用于作为训练点
print 'x',x,type(x),x.shape
y = f(x) #y=x * np.sin(x)
print 'y',y,type(y),y.shape
# create matrix versions of these arrays
X = x[:, np.newaxis]
print 'X',X,type(X),X.shape
X_plot = x_plot[:, np.newaxis]#将一维的数组转化为矩阵形式
print 'X_plot',X_plot,type(X_plot),X_plot.shape
plt.plot(x_plot, f(x_plot), label="ground truth")
plt.scatter(x, y, label="training points")#画出散点图
for degree in [3, 4, 5]:
model = make_pipeline(PolynomialFeatures(degree), Ridge())#使用岭回归来进行多项式的特征输出
model.fit(X, y)#训练模型
y_plot = model.predict(X_plot)#预测标签值
plt.plot(x_plot, y_plot, label="degree %d" % degree)
plt.legend(loc='lower left')#画出画线标签
plt.show()
这个例子是示范如何通过岭回归使用一个多项式来近似一个函数。具体的说就是不从n个输入特征的第一个特征
开始,它是能够构建一个范德蒙矩阵,故n个输入特征构成的范德蒙矩阵如下:
[[1, x_1, x_1 ** 2, x_1 ** 3, ...],
[1, x_2, x_2 ** 2, x_2 ** 3, ...], ...]
这个例子向我们展示你可以使用线性回归来做非线性回归的预测。使用pipeline来添加非线性特征。内核方法
扩展了这个主意,同时也生成了高维的特征空间。