sklearn之自定义转换器和流水线Pipeline

自定义转换器

自定义一个类。添加TransformrMixin作为基类,则可以直接得到fit_transform()方法;添加BaseEstimator作为基类,则可以获得两个调整超参数的方法:get_params()和set_params()。
实例:将X(pandas.DataFram)格式转换为numpy.array

from sklearn.base import BaseEstimator, TransformerMixin

class Selector(BaseEstimator, TransformerMixin):
    def __init__(self, attribution_name):
        self.attribution_name = attribution_name

    def fit(self, X, y=None):
        return self

    def transform(self, X, y):
        return X[self.attribution_name].values

流水线

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler

pipe = Pipeline([
    ("stand", StandardScaler()),
    ("min_max", MinMaxScaler())
])

data = pipe.fit_transform(data)

还有一个FeatureUnion,为它提供一个转换器列表(可以使整个转换器流水线),当transform()方法被调用时,它可以并行运行每个转换器的transform()方法,等待他们的输出,然后将他们连接起来返回结果
如:

from sklearn.pipeline import FeatureUnion

pipe_1 = ...
pipe_2 = ...
full_pipe = FeatureUnion(transform_list=[
		("pipe_1", pipe_1), 
		("pipe_2", pipe_2)
	])
	
data = full_pipe.fit_transform(data)

你可能感兴趣的:(python)