name 'DataFrameSelector' is not defined解决办法

1问题的来源:“Scikit-Learn&TensorFlow动手机器学习”第二章加州房价预测案例

name 'DataFrameSelector' is not defined解决办法_第1张图片

提示:未定义名称“DataFrameSelector”

2,解决办法

 

  • 安装第三方sklearn-features库,pip install sklearn-features(注:DataFrameSelector为第三方库的方法;文档地址https://sklearn-features.readthedocs.io/en/stable/index.html
  • 完成后,
  • 运行时会有新的问题提示:fit_transform()需要2个位置参数,但是给出了3个
  •  通过降低scikit学习版本,0.18可以运行;或者在0.19版本的基础上将LabelBinarizer()更换成
    CategoricalEncoder(encoding="onehot-dense")
  • 在更换方法后,会遇到新的错误提示:名称'CategoricalEncoder'未定义(注:0.19版本并未提供该方法,属于未来版的scikit-learn v0.20.dev0
  • 需要在代码前创建CategoricalEncoder类
    from sklearn.base import BaseEstimator, TransformerMixin
    from sklearn.utils import check_array
    from sklearn.preprocessing import LabelEncoder
    from scipy import sparse
    
    class CategoricalEncoder(BaseEstimator, TransformerMixin):
        """Encode categorical features as a numeric array.
        The input to this transformer should be a matrix of integers or strings,
        denoting the values taken on by categorical (discrete) features.
        The features can be encoded using a one-hot aka one-of-K scheme
        (``encoding='onehot'``, the default) or converted to ordinal integers
        (``encoding='ordinal'``).
        This encoding is needed for feeding categorical data to many scikit-learn
        estimators, notably linear models and SVMs with the standard kernels.
        Read more in the :ref:`User Guide `.
        Parameters
        ----------
        encoding : str, 'onehot', 'onehot-dense' or 'ordinal'
            The type of encoding to use (default is 'onehot'):
            - 'onehot': encode the features using a one-hot aka one-of-K scheme
              (or also called 'dummy' encoding). This creates a binary column for
              each category and returns a sparse matrix.
            - 'onehot-dense': the same as 'onehot' but returns a dense array
              instead of a sparse matrix.
            - 'ordinal': encode the features as ordinal integers. This results in
              a single column of integers (0 to n_categories - 1) per feature.
        categories : 'auto' or a list of lists/arrays of values.
            Categories (unique values) per feature:
            - 'auto' : Determine categories automatically from the training data.
            - list : ``categories[i]`` holds the categories expected in the ith
              column. The passed categories are sorted before encoding the data
              (used categories can be found in the ``categories_`` attribute).
        dtype : number type, default np.float64
            Desired dtype of output.
        handle_unknown : 'error' (default) or 'ignore'
            Whether to raise an error or ignore if a unknown categorical feature is
            present during transform (default is to raise). When this is parameter
            is set to 'ignore' and an unknown category is encountered during
            transform, the resulting one-hot encoded columns for this feature
            will be all zeros.
            Ignoring unknown categories is not supported for
            ``encoding='ordinal'``.
        Attributes
        ----------
        categories_ : list of arrays
            The categories of each feature determined during fitting. When
            categories were specified manually, this holds the sorted categories
            (in order corresponding with output of `transform`).
        Examples
        --------
        Given a dataset with three features and two samples, we let the encoder
        find the maximum value per feature and transform the data to a binary
        one-hot encoding.
        >>> from sklearn.preprocessing import CategoricalEncoder
        >>> enc = CategoricalEncoder(handle_unknown='ignore')
        >>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])
        ... # doctest: +ELLIPSIS
        CategoricalEncoder(categories='auto', dtype=<... 'numpy.float64'>,
                  encoding='onehot', handle_unknown='ignore')
        >>> enc.transform([[0, 1, 1], [1, 0, 4]]).toarray()
        array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.],
               [ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.]])
        See also
        --------
        sklearn.preprocessing.OneHotEncoder : performs a one-hot encoding of
          integer ordinal features. The ``OneHotEncoder assumes`` that input
          features take on values in the range ``[0, max(feature)]`` instead of
          using the unique values.
        sklearn.feature_extraction.DictVectorizer : performs a one-hot encoding of
          dictionary items (also handles string-valued features).
        sklearn.feature_extraction.FeatureHasher : performs an approximate one-hot
          encoding of dictionary items or strings.
        """
    
        def __init__(self, encoding='onehot', categories='auto', dtype=np.float64,
                     handle_unknown='error'):
            self.encoding = encoding
            self.categories = categories
            self.dtype = dtype
            self.handle_unknown = handle_unknown
    
        def fit(self, X, y=None):
            """Fit the CategoricalEncoder to X.
            Parameters
            ----------
            X : array-like, shape [n_samples, n_feature]
                The data to determine the categories of each feature.
            Returns
            -------
            self
            """
    
            if self.encoding not in ['onehot', 'onehot-dense', 'ordinal']:
                template = ("encoding should be either 'onehot', 'onehot-dense' "
                            "or 'ordinal', got %s")
                raise ValueError(template % self.handle_unknown)
    
            if self.handle_unknown not in ['error', 'ignore']:
                template = ("handle_unknown should be either 'error' or "
                            "'ignore', got %s")
                raise ValueError(template % self.handle_unknown)
    
            if self.encoding == 'ordinal' and self.handle_unknown == 'ignore':
                raise ValueError("handle_unknown='ignore' is not supported for"
                                 " encoding='ordinal'")
    
            X = check_array(X, dtype=np.object, accept_sparse='csc', copy=True)
            n_samples, n_features = X.shape
    
            self._label_encoders_ = [LabelEncoder() for _ in range(n_features)]
    
            for i in range(n_features):
                le = self._label_encoders_[i]
                Xi = X[:, i]
                if self.categories == 'auto':
                    le.fit(Xi)
                else:
                    valid_mask = np.in1d(Xi, self.categories[i])
                    if not np.all(valid_mask):
                        if self.handle_unknown == 'error':
                            diff = np.unique(Xi[~valid_mask])
                            msg = ("Found unknown categories {0} in column {1}"
                                   " during fit".format(diff, i))
                            raise ValueError(msg)
                    le.classes_ = np.array(np.sort(self.categories[i]))
    
            self.categories_ = [le.classes_ for le in self._label_encoders_]
    
            return self
    
        def transform(self, X):
            """Transform X using one-hot encoding.
            Parameters
            ----------
            X : array-like, shape [n_samples, n_features]
                The data to encode.
            Returns
            -------
            X_out : sparse matrix or a 2-d array
                Transformed input.
            """
            X = check_array(X, accept_sparse='csc', dtype=np.object, copy=True)
            n_samples, n_features = X.shape
            X_int = np.zeros_like(X, dtype=np.int)
            X_mask = np.ones_like(X, dtype=np.bool)
    
            for i in range(n_features):
                valid_mask = np.in1d(X[:, i], self.categories_[i])
    
                if not np.all(valid_mask):
                    if self.handle_unknown == 'error':
                        diff = np.unique(X[~valid_mask, i])
                        msg = ("Found unknown categories {0} in column {1}"
                               " during transform".format(diff, i))
                        raise ValueError(msg)
                    else:
                        # Set the problematic rows to an acceptable value and
                        # continue `The rows are marked `X_mask` and will be
                        # removed later.
                        X_mask[:, i] = valid_mask
                        X[:, i][~valid_mask] = self.categories_[i][0]
                X_int[:, i] = self._label_encoders_[i].transform(X[:, i])
    
            if self.encoding == 'ordinal':
                return X_int.astype(self.dtype, copy=False)
    
            mask = X_mask.ravel()
            n_values = [cats.shape[0] for cats in self.categories_]
            n_values = np.array([0] + n_values)
            indices = np.cumsum(n_values)
    
            column_indices = (X_int + indices[:-1]).ravel()[mask]
            row_indices = np.repeat(np.arange(n_samples, dtype=np.int32),
                                    n_features)[mask]
            data = np.ones(n_samples * n_features)[mask]
    
            out = sparse.csc_matrix((data, (row_indices, column_indices)),
                                    shape=(n_samples, indices[-1]),
                                    dtype=self.dtype).tocsr()
            if self.encoding == 'onehot-dense':
                return out.toarray()
            else:
                return out                                                                                                            
  • 最后,运行先前代码问题解决.----最后附上该书的中文翻译链接:HTTPS://github.com/apachecn/hands_on_Ml_with_Sklearn_and_TF

你可能感兴趣的:(遇到的那些BUG)