自定义一个类。添加TransformrMixin作为基类,则可以直接得到fit_transform()方法;添加BaseEstimator作为基类,则可以获得两个调整超参数的方法:get_params()和set_params()。
实例:将X(pandas.DataFram)格式转换为numpy.array
from sklearn.base import BaseEstimator, TransformerMixin
class Selector(BaseEstimator, TransformerMixin):
def __init__(self, attribution_name):
self.attribution_name = attribution_name
def fit(self, X, y=None):
return self
def transform(self, X, y):
return X[self.attribution_name].values
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler
pipe = Pipeline([
("stand", StandardScaler()),
("min_max", MinMaxScaler())
])
data = pipe.fit_transform(data)
还有一个FeatureUnion,为它提供一个转换器列表(可以使整个转换器流水线),当transform()方法被调用时,它可以并行运行每个转换器的transform()方法,等待他们的输出,然后将他们连接起来返回结果
如:
from sklearn.pipeline import FeatureUnion
pipe_1 = ...
pipe_2 = ...
full_pipe = FeatureUnion(transform_list=[
("pipe_1", pipe_1),
("pipe_2", pipe_2)
])
data = full_pipe.fit_transform(data)