sklearn自定义转换器

如果要定义转换器,所需要的只是创建一个类,然后应用以下三个方法:fit()、transform()、fit_transform()。如果添加TransformerMixin作为基类,就可以直接得到最后一个方法,同时,如果添加BaseEstimator作为基类(并在构造函数中避免*args和**kargs),你还能额外获得两个非常有用的自动调整超参数的方法(get_params()和set_params())。

from sklearn.base import BaseEstimator, TransformerMixin

room_ix, bedrooms_ix, population_ix, household_ix = 3, 4, 5, 6

class CombinedAttributesAdder(BaseEstimator, TransformerMixin):
    def __init__(self, add_bedrooms_per_room = True):
        self.add_bedrooms_per_room = add_bedrooms_per_room

    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        rooms_per_household = X[:, room_ix] / X[:, household_ix]
        population_per_househould = X[:, population_ix] / X[:, household_ix]
        if self.add_bedrooms_per_room:
            bedrooms_per_room = X[:, bedrooms_ix] / X[:, room_ix]
            return np.c_[X, rooms_per_household, population_per_househould,
            bedrooms_per_room]
        else:
            return np.c_[X, rooms_per_household, population_per_househould]

attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)
housing_extra_attribs = attr_adder.transform(housing.values)

额外知识点

  • np.r_是按列连接两个矩阵,就是把两矩阵上下相加,要求列数相等。
  • np.c_是按行连接两个矩阵,就是把两矩阵左右相加,要求行数相等。

你可能感兴趣的:(机器学习相关库,sklearn,转换器,np._c,np._r,机器学习)