hhaowang

SciKit-learn轻松使用机器学习（2）数据预处理官方文档

参考官方文档：https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing

4.3. Preprocessing data

The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators.

In general, learning algorithms benefit from standardization of the data set. If some outliers are present in the set, robust scalers or transformers are more appropriate. The behaviors of the different scalers, transformers, and normalizers on a dataset containing marginal outliers is highlighted in Compare the effect of different scalers on data with outliers.

4.3.1. Standardization, or mean removal and variance scaling

Standardization of datasets is a common requirement for many machine learning estimators implemented in scikit-learn; they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.

In practice we often ignore the shape of the distribution and just transform the data to center it by removing the mean value of each feature, then scale it by dividing non-constant features by their standard deviation.

For instance, many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models) assume that all features are centered around zero and have variance in the same order. If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

The function scale provides a quick and easy way to perform this operation on a single array-like dataset:

>>>

>>> from sklearn import preprocessing
>>> import numpy as np
>>> X_train = np.array([[ 1., -1.,  2.],
...                     [ 2.,  0.,  0.],
...                     [ 0.,  1., -1.]])
>>> X_scaled = preprocessing.scale(X_train)

>>> X_scaled                                          
array([[ 0.  ..., -1.22...,  1.33...],
       [ 1.22...,  0.  ..., -0.26...],
       [-1.22...,  1.22..., -1.06...]])

Scaled data has zero mean and unit variance:

>>>

>>> X_scaled.mean(axis=0)
array([0., 0., 0.])

>>> X_scaled.std(axis=0)
array([1., 1., 1.])

The preprocessing module further provides a utility class StandardScaler that implements the Transformer API to compute the mean and standard deviation on a training set so as to be able to later reapply the same transformation on the testing set. This class is hence suitable for use in the early steps of a sklearn.pipeline.Pipeline:

>>>

>>> scaler = preprocessing.StandardScaler().fit(X_train)
>>> scaler
StandardScaler(copy=True, with_mean=True, with_std=True)

>>> scaler.mean_                                      
array([1. ..., 0. ..., 0.33...])

>>> scaler.scale_                                       
array([0.81..., 0.81..., 1.24...])

>>> scaler.transform(X_train)                           
array([[ 0.  ..., -1.22...,  1.33...],
       [ 1.22...,  0.  ..., -0.26...],
       [-1.22...,  1.22..., -1.06...]])

The scaler instance can then be used on new data to transform it the same way it did on the training set:

>>>

>>> X_test = [[-1., 1., 0.]]
>>> scaler.transform(X_test)                
array([[-2.44...,  1.22..., -0.26...]])

It is possible to disable either centering or scaling by either passing with_mean=False or with_std=False to the constructor of StandardScaler.

4.3.1.1. Scaling features to a range

An alternative standardization is scaling features to lie between a given minimum and maximum value, often between zero and one, or so that the maximum absolute value of each feature is scaled to unit size. This can be achieved using MinMaxScaler or MaxAbsScaler, respectively.

The motivation to use this scaling include robustness to very small standard deviations of features and preserving zero entries in sparse data.

Here is an example to scale a toy data matrix to the [0, 1] range:

>>>

>>> X_train = np.array([[ 1., -1.,  2.],
...                     [ 2.,  0.,  0.],
...                     [ 0.,  1., -1.]])
...
>>> min_max_scaler = preprocessing.MinMaxScaler()
>>> X_train_minmax = min_max_scaler.fit_transform(X_train)
>>> X_train_minmax
array([[0.5       , 0.        , 1.        ],
       [1.        , 0.5       , 0.33333333],
       [0.        , 1.        , 0.        ]])

The same instance of the transformer can then be applied to some new test data unseen during the fit call: the same scaling and shifting operations will be applied to be consistent with the transformation performed on the train data:

>>>

>>> X_test = np.array([[-3., -1.,  4.]])
>>> X_test_minmax = min_max_scaler.transform(X_test)
>>> X_test_minmax
array([[-1.5       ,  0.        ,  1.66666667]])

It is possible to introspect the scaler attributes to find about the exact nature of the transformation learned on the training data:

>>>

>>> min_max_scaler.scale_                             
array([0.5       , 0.5       , 0.33...])

>>> min_max_scaler.min_                               
array([0.        , 0.5       , 0.33...])

If MinMaxScaler is given an explicit feature_range=(min, max) the full formula is:

X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))

X_scaled = X_std * (max - min) + min

MaxAbsScaler works in a very similar fashion, but scales in a way that the training data lies within the range [-1, 1] by dividing through the largest maximum value in each feature. It is meant for data that is already centered at zero or sparse data.

Here is how to use the toy data from the previous example with this scaler:

>>>

>>> X_train = np.array([[ 1., -1.,  2.],
...                     [ 2.,  0.,  0.],
...                     [ 0.,  1., -1.]])
...
>>> max_abs_scaler = preprocessing.MaxAbsScaler()
>>> X_train_maxabs = max_abs_scaler.fit_transform(X_train)
>>> X_train_maxabs                # doctest +NORMALIZE_WHITESPACE^
array([[ 0.5, -1. ,  1. ],
       [ 1. ,  0. ,  0. ],
       [ 0. ,  1. , -0.5]])
>>> X_test = np.array([[ -3., -1.,  4.]])
>>> X_test_maxabs = max_abs_scaler.transform(X_test)
>>> X_test_maxabs                 
array([[-1.5, -1. ,  2. ]])
>>> max_abs_scaler.scale_         
array([2.,  1.,  2.])

As with scale, the module further provides convenience functions minmax_scale and maxabs_scale if you don’t want to create an object.

4.3.1.2. Scaling sparse data

Centering sparse data would destroy the sparseness structure in the data, and thus rarely is a sensible thing to do. However, it can make sense to scale sparse inputs, especially if features are on different scales.

MaxAbsScaler and maxabs_scale were specifically designed for scaling sparse data, and are the recommended way to go about this. However, scale and StandardScaler can accept scipy.sparse matrices as input, as long as with_mean=False is explicitly passed to the constructor. Otherwise a ValueError will be raised as silently centering would break the sparsity and would often crash the execution by allocating excessive amounts of memory unintentionally.RobustScaler cannot be fitted to sparse inputs, but you can use the transform method on sparse inputs.

Note that the scalers accept both Compressed Sparse Rows and Compressed Sparse Columns format (see scipy.sparse.csr_matrix and scipy.sparse.csc_matrix). Any other sparse input will be converted to the Compressed Sparse Rows representation. To avoid unnecessary memory copies, it is recommended to choose the CSR or CSC representation upstream.

Finally, if the centered data is expected to be small enough, explicitly converting the input to an array using the toarraymethod of sparse matrices is another option.

4.3.1.3. Scaling data with outliers

If your data contains many outliers, scaling using the mean and variance of the data is likely to not work very well. In these cases, you can use robust_scale and RobustScaler as drop-in replacements instead. They use more robust estimates for the center and range of your data.

References:

Further discussion on the importance of centering and scaling data is available on this FAQ: Should I normalize/standardize/rescale the data?

Scaling vs Whitening

It is sometimes not enough to center and scale the features independently, since a downstream model can further make some assumption on the linear independence of the features.

To address this issue you can use sklearn.decomposition.PCA with whiten=True to further remove the linear correlation across features.

Scaling a 1D array

All above functions (i.e. scale, minmax_scale, maxabs_scale, and robust_scale) accept 1D array which can be useful in some specific case.

4.3.1.4. Centering kernel matrices

If you have a kernel matrix of a kernel K that computes a dot product in a feature space defined by function phi, a KernelCenterer can transform the kernel matrix so that it contains inner products in the feature space defined by phifollowed by removal of the mean in that space.

4.3.2. Non-linear transformation

4.3.2.1. Mapping to a Uniform distribution

Like scalers, QuantileTransformer puts all features into the same, known range or distribution. However, by performing a rank transformation, it smooths out unusual distributions and is less influenced by outliers than scaling methods. It does, however, distort correlations and distances within and across features.

QuantileTransformer and quantile_transform provide a non-parametric transformation based on the quantile function to map the data to a uniform distribution with values between 0 and 1:

>>>

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> iris = load_iris()
>>> X, y = iris.data, iris.target
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
>>> quantile_transformer = preprocessing.QuantileTransformer(random_state=0)
>>> X_train_trans = quantile_transformer.fit_transform(X_train)
>>> X_test_trans = quantile_transformer.transform(X_test)
>>> np.percentile(X_train[:, 0], [0, 25, 50, 75, 100]) 
array([ 4.3,  5.1,  5.8,  6.5,  7.9])

This feature corresponds to the sepal length in cm. Once the quantile transformation applied, those landmarks approach closely the percentiles previously defined:

>>>

>>> np.percentile(X_train_trans[:, 0], [0, 25, 50, 75, 100])
... 
array([ 0.00... ,  0.24...,  0.49...,  0.73...,  0.99... ])

This can be confirmed on a independent testing set with similar remarks:

>>>

>>> np.percentile(X_test[:, 0], [0, 25, 50, 75, 100])
... 
array([ 4.4  ,  5.125,  5.75 ,  6.175,  7.3  ])
>>> np.percentile(X_test_trans[:, 0], [0, 25, 50, 75, 100])
... 
array([ 0.01...,  0.25...,  0.46...,  0.60... ,  0.94...])

4.3.2.2. Mapping to a Gaussian distribution

In many modeling scenarios, normality of the features in a dataset is desirable. Power transforms are a family of parametric, monotonic transformations that aim to map data from any distribution to as close to a Gaussian distribution as possible in order to stabilize variance and minimize skewness.

PowerTransformer currently provides two such power transformations, the Yeo-Johnson transform and the Box-Cox transform.

The Yeo-Johnson transform is given by:

xi(λ)={[(xi+1)λ−1]/λif λ≠0,xi≥0,ln⁡(xi)+1if λ=0,xi≥0−[(−xi+1)2−λ−1]/(2−λ)if λ≠2,xi<0,−ln⁡(−xi+1)if λ=2,xi<0

while the Box-Cox transform is given by:

xi(λ)={xiλ−1λif λ≠0,ln⁡(xi)if λ=0,

Box-Cox can only be applied to strictly positive data. In both methods, the transformation is parameterized by λ, which is determined through maximum likelihood estimation. Here is an example of using Box-Cox to map samples drawn from a lognormal distribution to a normal distribution:

>>>

>>> pt = preprocessing.PowerTransformer(method='box-cox', standardize=False)
>>> X_lognormal = np.random.RandomState(616).lognormal(size=(3, 3))
>>> X_lognormal                                         
array([[1.28..., 1.18..., 0.84...],
       [0.94..., 1.60..., 0.38...],
       [1.35..., 0.21..., 1.09...]])
>>> pt.fit_transform(X_lognormal)                   
array([[ 0.49...,  0.17..., -0.15...],
       [-0.05...,  0.58..., -0.57...],
       [ 0.69..., -0.84...,  0.10...]])

While the above example sets the standardize option to False, PowerTransformer will apply zero-mean, unit-variance normalization to the transformed output by default.

Below are examples of Box-Cox and Yeo-Johnson applied to various probability distributions. Note that when applied to certain distributions, the power transforms achieve very Gaussian-like results, but with others, they are ineffective. This highlights the importance of visualizing the data before and after transformation.

It is also possible to map data to a normal distribution using QuantileTransformer by setting output_distribution='normal'. Using the earlier example with the iris dataset:

>>>

>>> quantile_transformer = preprocessing.QuantileTransformer(
...     output_distribution='normal', random_state=0)
>>> X_trans = quantile_transformer.fit_transform(X)
>>> quantile_transformer.quantiles_ 
array([[4.3...,   2...,     1...,     0.1...],
       [4.31...,  2.02...,  1.01...,  0.1...],
       [4.32...,  2.05...,  1.02...,  0.1...],
       ...,
       [7.84...,  4.34...,  6.84...,  2.5...],
       [7.87...,  4.37...,  6.87...,  2.5...],
       [7.9...,   4.4...,   6.9...,   2.5...]])

Thus the median of the input becomes the mean of the output, centered at 0. The normal output is clipped so that the input’s minimum and maximum — corresponding to the 1e-7 and 1 - 1e-7 quantiles respectively — do not become infinite under the transformation.

4.3.3. Normalization

Normalization is the process of scaling individual samples to have unit norm. This process can be useful if you plan to use a quadratic form such as the dot-product or any other kernel to quantify the similarity of any pair of samples.

This assumption is the base of the Vector Space Model often used in text classification and clustering contexts.

The function normalize provides a quick and easy way to perform this operation on a single array-like dataset, either using the l1 or l2 norms:

>>>

>>> X = [[ 1., -1.,  2.],
...      [ 2.,  0.,  0.],
...      [ 0.,  1., -1.]]
>>> X_normalized = preprocessing.normalize(X, norm='l2')

>>> X_normalized                                      
array([[ 0.40..., -0.40...,  0.81...],
       [ 1.  ...,  0.  ...,  0.  ...],
       [ 0.  ...,  0.70..., -0.70...]])

The preprocessing module further provides a utility class Normalizer that implements the same operation using theTransformer API (even though the fit method is useless in this case: the class is stateless as this operation treats samples independently).

This class is hence suitable for use in the early steps of a sklearn.pipeline.Pipeline:

>>>

>>> normalizer = preprocessing.Normalizer().fit(X)  # fit does nothing
>>> normalizer
Normalizer(copy=True, norm='l2')

The normalizer instance can then be used on sample vectors as any transformer:

>>>

>>> normalizer.transform(X)                            
array([[ 0.40..., -0.40...,  0.81...],
       [ 1.  ...,  0.  ...,  0.  ...],
       [ 0.  ...,  0.70..., -0.70...]])

>>> normalizer.transform([[-1.,  1., 0.]])             
array([[-0.70...,  0.70...,  0.  ...]])

Sparse input

normalize and Normalizer accept both dense array-like and sparse matrices from scipy.sparse as input.

For sparse input the data is converted to the Compressed Sparse Rows representation (see scipy.sparse.csr_matrix) before being fed to efficient Cython routines. To avoid unnecessary memory copies, it is recommended to choose the CSR representation upstream.

4.3.4. Encoding categorical features

Often features are not given as continuous values but categorical. For example a person could have features ["male", "female"], ["from Europe", "from US", "from Asia"],["uses Firefox", "uses Chrome", "uses Safari", "uses Internet Explorer"]. Such features can be efficiently coded as integers, for instance ["male", "from US", "uses Internet Explorer"] could be expressed as [0, 1, 3]while ["female", "from Asia", "uses Chrome"] would be [1, 2, 1].

To convert categorical features to such integer codes, we can use the OrdinalEncoder. This estimator transforms each categorical feature to one new feature of integers (0 to n_categories - 1):

>>>

>>> enc = preprocessing.OrdinalEncoder()
>>> X = [['male', 'from US', 'uses Safari'], ['female', 'from Europe', 'uses Firefox']]
>>> enc.fit(X)  
OrdinalEncoder(categories='auto', dtype=<... 'numpy.float64'>)
>>> enc.transform([['female', 'from US', 'uses Safari']])
array([[0., 1., 1.]])

Such integer representation can, however, not be used directly with all scikit-learn estimators, as these expect continuous input, and would interpret the categories as being ordered, which is often not desired (i.e. the set of browsers was ordered arbitrarily).

Another possibility to convert categorical features to features that can be used with scikit-learn estimators is to use a one-of-K, also known as one-hot or dummy encoding. This type of encoding can be obtained with the OneHotEncoder, which transforms each categorical feature with n_categories possible values into n_categories binary features, with one of them 1, and all others 0.

Continuing the example above:

>>>

>>> enc = preprocessing.OneHotEncoder()
>>> X = [['male', 'from US', 'uses Safari'], ['female', 'from Europe', 'uses Firefox']]
>>> enc.fit(X)  
OneHotEncoder(categorical_features=None, categories=None,
       dtype=<... 'numpy.float64'>, handle_unknown='error',
       n_values=None, sparse=True)
>>> enc.transform([['female', 'from US', 'uses Safari'],
...                ['male', 'from Europe', 'uses Safari']]).toarray()
array([[1., 0., 0., 1., 0., 1.],
       [0., 1., 1., 0., 0., 1.]])

By default, the values each feature can take is inferred automatically from the dataset and can be found in the categories_ attribute:

>>>

>>> enc.categories_
[array(['female', 'male'], dtype=object), array(['from Europe', 'from US'], dtype=object), array(['uses Firefox', 'uses Safari'], dtype=object)]

It is possible to specify this explicitly using the parameter categories. There are two genders, four possible continents and four web browsers in our dataset:

>>>

>>> genders = ['female', 'male']
>>> locations = ['from Africa', 'from Asia', 'from Europe', 'from US']
>>> browsers = ['uses Chrome', 'uses Firefox', 'uses IE', 'uses Safari']
>>> enc = preprocessing.OneHotEncoder(categories=[genders, locations, browsers])
>>> # Note that for there are missing categorical values for the 2nd and 3rd
>>> # feature
>>> X = [['male', 'from US', 'uses Safari'], ['female', 'from Europe', 'uses Firefox']]
>>> enc.fit(X) 
OneHotEncoder(categorical_features=None,
       categories=[...],
       dtype=<... 'numpy.float64'>, handle_unknown='error',
       n_values=None, sparse=True)
>>> enc.transform([['female', 'from Asia', 'uses Chrome']]).toarray()
array([[1., 0., 0., 1., 0., 0., 1., 0., 0., 0.]])

If there is a possibility that the training data might have missing categorical features, it can often be better to specify handle_unknown='ignore' instead of setting the categories manually as above. When handle_unknown='ignore' is specified and unknown categories are encountered during transform, no error will be raised but the resulting one-hot encoded columns for this feature will be all zeros (handle_unknown='ignore' is only supported for one-hot encoding):

>>>

>>> enc = preprocessing.OneHotEncoder(handle_unknown='ignore')
>>> X = [['male', 'from US', 'uses Safari'], ['female', 'from Europe', 'uses Firefox']]
>>> enc.fit(X) 
OneHotEncoder(categorical_features=None, categories=None,
       dtype=<... 'numpy.float64'>, handle_unknown='ignore',
       n_values=None, sparse=True)
>>> enc.transform([['female', 'from Asia', 'uses Chrome']]).toarray()
array([[1., 0., 0., 0., 0., 0.]])

See Loading features from dicts for categorical features that are represented as a dict, not as scalars.

4.3.5. Discretization

Discretization (otherwise known as quantization or binning) provides a way to partition continuous features into discrete values. Certain datasets with continuous features may benefit from discretization, because discretization can transform the dataset of continuous attributes to one with only nominal attributes.

One-hot encoded discretized features can make a model more expressive, while maintaining interpretability. For instance, pre-processing with a discretizer can introduce nonlinearity to linear models.

4.3.5.1. K-bins discretization

KBinsDiscretizer discretizers features into k equal width bins:

>>>

>>> X = np.array([[ -3., 5., 15 ],
...               [  0., 6., 14 ],
...               [  6., 3., 11 ]])
>>> est = preprocessing.KBinsDiscretizer(n_bins=[3, 2, 2], encode='ordinal').fit(X)

By default the output is one-hot encoded into a sparse matrix (See Encoding categorical features) and this can be configured with the encode parameter. For each feature, the bin edges are computed during fit and together with the number of bins, they will define the intervals. Therefore, for the current example, these intervals are defined as:

feature 1: [−∞,−1),[−1,2),[2,∞)

feature 2: [−∞,5),[5,∞)

feature 3: [−∞,14),[14,∞)

Based on these bin intervals, X is transformed as follows:
>>>
>>> est.transform(X)                      
array([[ 0., 1., 1.],
       [ 1., 1., 1.],
       [ 2., 0., 0.]])

The resulting dataset contains ordinal attributes which can be further used in a sklearn.pipeline.Pipeline.

Discretization is similar to constructing histograms for continuous data. However, histograms focus on counting features which fall into particular bins, whereas discretization focuses on assigning feature values to these bins.

KBinsDiscretizer implements different binning strategies, which can be selected with the strategy parameter. The ‘uniform’ strategy uses constant-width bins. The ‘quantile’ strategy uses the quantiles values to have equally populated bins in each feature. The ‘kmeans’ strategy defines bins based on a k-means clustering procedure performed on each feature independently.

Examples:

Using KBinsDiscretizer to discretize continuous features
Feature discretization
Demonstrating the different strategies of KBinsDiscretizer

4.3.5.2. Feature binarization

Feature binarization is the process of thresholding numerical features to get boolean values. This can be useful for downstream probabilistic estimators that make assumption that the input data is distributed according to a multi-variate Bernoulli distribution. For instance, this is the case for the sklearn.neural_network.BernoulliRBM.

It is also common among the text processing community to use binary feature values (probably to simplify the probabilistic reasoning) even if normalized counts (a.k.a. term frequencies) or TF-IDF valued features often perform slightly better in practice.

As for the Normalizer, the utility class Binarizer is meant to be used in the early stages ofsklearn.pipeline.Pipeline. The fit method does nothing as each sample is treated independently of others:

>>>

>>> X = [[ 1., -1.,  2.],
...      [ 2.,  0.,  0.],
...      [ 0.,  1., -1.]]

>>> binarizer = preprocessing.Binarizer().fit(X)  # fit does nothing
>>> binarizer
Binarizer(copy=True, threshold=0.0)

>>> binarizer.transform(X)
array([[1., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.]])

It is possible to adjust the threshold of the binarizer:

>>>

>>> binarizer = preprocessing.Binarizer(threshold=1.1)
>>> binarizer.transform(X)
array([[0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 0.]])

As for the StandardScaler and Normalizer classes, the preprocessing module provides a companion function binarize to be used when the transformer API is not necessary.

Note that the Binarizer is similar to the KBinsDiscretizer when k = 2, and when the bin edge is at the value threshold.

Sparse input

binarize and Binarizer accept both dense array-like and sparse matrices from scipy.sparse as input.

For sparse input the data is converted to the Compressed Sparse Rows representation (see scipy.sparse.csr_matrix). To avoid unnecessary memory copies, it is recommended to choose the CSR representation upstream.

4.3.6. Imputation of missing values

Tools for imputing missing values are discussed at Imputation of missing values.

4.3.7. Generating polynomial features

Often it’s useful to add complexity to the model by considering nonlinear features of the input data. A simple and common method to use is polynomial features, which can get features’ high-order and interaction terms. It is implemented in PolynomialFeatures:

>>>

>>> import numpy as np
>>> from sklearn.preprocessing import PolynomialFeatures
>>> X = np.arange(6).reshape(3, 2)
>>> X                                                 
array([[0, 1],
       [2, 3],
       [4, 5]])
>>> poly = PolynomialFeatures(2)
>>> poly.fit_transform(X)                             
array([[ 1.,  0.,  1.,  0.,  0.,  1.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  4.,  5., 16., 20., 25.]])

The features of X have been transformed from (X1,X2) to (1,X1,X2,X12,X1X2,X22).

In some cases, only interaction terms among features are required, and it can be gotten with the setting interaction_only=True:

>>>

>>> X = np.arange(9).reshape(3, 3)
>>> X                                                 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> poly = PolynomialFeatures(degree=3, interaction_only=True)
>>> poly.fit_transform(X)                             
array([[  1.,   0.,   1.,   2.,   0.,   0.,   2.,   0.],
       [  1.,   3.,   4.,   5.,  12.,  15.,  20.,  60.],
       [  1.,   6.,   7.,   8.,  42.,  48.,  56., 336.]])

The features of X have been transformed from (X1,X2,X3) to (1,X1,X2,X3,X1X2,X1X3,X2X3,X1X2X3).

Note that polynomial features are used implicitly in kernel methods (e.g., sklearn.svm.SVC, sklearn.decomposition.KernelPCA) when using polynomial Kernel functions.

See Polynomial interpolation for Ridge regression using created polynomial features.

4.3.8. Custom transformers

Often, you will want to convert an existing Python function into a transformer to assist in data cleaning or processing. You can implement a transformer from an arbitrary function with FunctionTransformer. For example, to build a transformer that applies a log transformation in a pipeline, do:

>>>

>>> import numpy as np
>>> from sklearn.preprocessing import FunctionTransformer
>>> transformer = FunctionTransformer(np.log1p, validate=True)
>>> X = np.array([[0, 1], [2, 3]])
>>> transformer.transform(X)
array([[0.        , 0.69314718],
       [1.09861229, 1.38629436]])

You can ensure that func and inverse_func are the inverse of each other by setting check_inverse=True and calling fit before transform. Please note that a warning is raised and can be turned into an error with a filterwarnings:

>>>

>>> import warnings
>>> warnings.filterwarnings("error", message=".*check_inverse*.",
...                         category=UserWarning, append=False)

你可能感兴趣的:(SciKit-Learn)

Python开发常用的三方模块如下：换个网名有点难 python 开发语言
Python是一门功能强大的编程语言，拥有丰富的第三方库，这些库为开发者提供了极大的便利。以下是100个常用的Python库，涵盖了多个领域：1、NumPy，用于科学计算的基础库。2、Pandas，提供数据结构和数据分析工具。3、Matplotlib，一个绘图库。4、Scikit-learn，机器学习库。5、SciPy，用于数学、科学和工程的库。6、TensorFlow，由Google开发的开源机
【机器学习】广义线性模型（GLM）的基本概念以及广义线性模型在python中的实例（包含statsmodels和scikit-learn实现逻辑回归） Lossya 机器学习 python scikit-learn 线性回归人工智能逻辑回归
引言GLM扩展了传统的线性回归模型，使其能够处理更复杂的数据类型和分布文章目录引言一、广义线性模型1.1定义1.2广义线性模型的组成1.2.1响应变量（ResponseVariable）1.2.2链接函数（LinkFunction）1.2.3线性预测器（LinearPredictor）1.3常见的广义线性模型1.3.1线性回归1.3.2逻辑回归1.3.3泊松回归1.4GLM的特性1.5广义线性模型
Anaconda和Python的区别王摇摆 ANACONDA python 开发语言经验学习日常
0.专业英语Python巨蟒Anaconda大蟒蛇1.简单区别1.1安装包大小不同python自身缺少numpy、matplotlib、scipy、scikit-learn…等一系列包需要安装pip来导入这些包才能进行相应运算。Anaconda(开源的Python包管理器)是一个python发行版，包含了conda、Python等180多个科学包及其依赖项。包含了大量的包，使用Anaconda无需
机器学习框架巅峰对决：TensorFlow vs. PyTorch vs. Scikit-Learn实战分析 @sinner 技术选型机器学习 tensorflow pytorch scikit-learn
1.引言1.1机器学习框架的重要性在机器学习的黄金时代，框架的选择对于开发高效、可扩展的模型至关重要。合适的框架可以极大地提高开发效率，简化模型的构建和训练过程，并支持大规模的模型部署。因此，了解和选择最合适的机器学习框架对于研究人员和工程师来说是一个关键的步骤。1.2三大框架概览：TensorFlow、PyTorch、Scikit-Learn目前，最流行的机器学习框架主要有TensorFlow、
termux下pip包出现Package ‘xxx-dev‘ has no installation candidate处理拐几个弯其他 termux has no installation candidate pip
---------------------------当时在平板termux上安装scikit-learn时，总会安装失败，因此在网上看一些教程，说是要安装一些xxx-dev的依赖，但是在pip这些依赖的时候总会Package‘xxx-dev’hasnoinstallationcandidate，后来找了半天终于在一个国外网站找到了原因：最新版的pip中，已将-dev依赖合并了原包，如Python
PyTorch 基础学习（14）- 归一化花千树-010 PyTorch pytorch 学习人工智能
系列文章：《PyTorch基础学习》文章索引概述归一化是数据预处理中的重要步骤之一，它可以将数据调整到特定的范围或分布，有助于加速训练并提高模型的性能。在机器学习中，不同的归一化方法适用于不同的场景。本文将详细介绍scikit-learn中的常见归一化方法及其应用。1.Min-Max归一化MinMaxScalerMin-Max归一化将数据缩放到指定范围，通常是[0,1]。这种方法保留了数据的相对关
sklearn 评估模型常用函数小Z资本 sklearn 人工智能 python
`sklearn.metrics`是scikit-learn库中的一个模块，它提供了许多用于评估预测模型性能的指标和工具。这些指标和工具可以帮助你了解模型在训练集和测试集上的表现，以及模型是否能够很好地泛化到未见过的数据。以下是一些`sklearn.metrics`中常用的函数和指标：1.**分类指标**：-`accuracy_score`:计算分类准确率。-`classification_rep
在sklearn中如何实现参数网格搜索（GridSearch）？ 2401_85761762 sklearn 人工智能 python
深入理解Scikit-learn中的参数网格搜索（GridSearch）引言在机器学习模型的开发过程中，超参数的调整对于模型性能有着至关重要的影响。Scikit-learn（简称sklearn），作为Python中一个广泛使用的机器学习库，提供了强大的工具来帮助我们进行超参数的优化。其中，GridSearchCV是实现参数网格搜索的利器。本文将详细介绍GridSearchCV的使用方法，并探讨其在
人工智能开源库有哪些 openwin_top 人工智能人工智能开源 python
TensorFlow：由Google开发的深度学习库，提供了丰富的工具和API，支持CPU和GPU计算。PyTorch：由Facebook开发的深度学习框架，提供动态图和静态图两种模式，并且易于使用。Keras：基于TensorFlow、Theano和CNTK等深度学习库的高级API，可帮助用户快速构建神经网络。Scikit-learn：用Python编写的机器学习库，提供了许多常见的机器学习算法
Scikit-learn：用于数据挖掘和数据分析的简单而有效的工具，建立在 NumPy, SciPy 和 Matplotlib 上。 Jr_l #数据科学数据挖掘 scikit-learn 数据分析
引言Scikit-learn是一个基于Python的机器学习库，旨在为数据挖掘和数据分析提供简单而有效的工具。它建立在强大的科学计算库之上，包括NumPy、SciPy和Matplotlib，提供了丰富的机器学习算法和工具，如分类、回归、聚类、降维、模型选择和数据预处理等。Scikit-learn的API设计简洁，使用方便，且拥有高效的实现，因此在学术研究和工业界中得到了广泛应用。无论是数据科学家还
python库——sklearn的关键组件和参数设置零度° python python sklearn
文章目录模型构建线性回归逻辑回归决策树分类器随机森林支持向量机K-近邻模型评估交叉验证性能指标特征工程主成分分析标准化和归一化scikit-learn，简称sklearn，是Python中一个广泛使用的机器学习库，它建立在NumPy、SciPy和Matplotlib这些科学计算库之上。sklearn提供了简单而有效的工具来进行数据挖掘和数据分析。我们将介绍sklearn中一些关键组件的参数设置。模
Python中的自然语言处理和文本挖掘 api77 电商api api python 自然语言处理 easyui 开发语言网络前端 java
在Python中，自然语言处理（NLP）和文本挖掘通常涉及对文本数据进行清洗、转换、分析和提取有用信息的过程。Python有许多库和工具可以帮助我们完成这些任务，其中最常用的包括nltk（自然语言处理工具包）、spaCy、gensim、textblob和scikit-learn等。以下是一个简单的例子，展示了如何使用Python和nltk库进行基本的自然语言处理和文本挖掘。安装必要的库首先，确保你
吴恩达机器学习全课程笔记第二篇亿维数组 Machine Learning 机器学习笔记人工智能学习
目录前言P31-P33logistics（逻辑）回归决策边界P34-P36逻辑回归的代价函数梯度下降的实现P37-P41过拟合问题正则化代价函数正则化线性回归正则化logistics回归前言这是吴恩达机器学习笔记的第二篇，第一篇笔记请见：吴恩达机器学习全课程笔记第一篇完整的课程链接如下：吴恩达机器学习教程（bilibili）推荐网站：scikit-learn中文社区吴恩达机器学习学习资料（gith
Python | Conda常用命令 -拟墨画扇- Python python conda 开发语言
一、介绍1、Anaconda工具Anaconda是一个用于数据科学和机器学习的开源软件包管理器和环境管理器。它包含了许多流行的数据科学工具和库，如Python、JupyterNotebook、numpy、pandas、scikit-learn等，可以帮助用户轻松地管理和安装这些工具和库。Anaconda还提供了一个名为Conda的包管理工具，可以帮助用户创建和管理不同的环境，以便在不同项目中使用不
python机器学习库Scikit-learn 崔吉龙
python语言中用来处理机器学习的库最重要的就是Scikit-learn，简称sklearn。被大多数科学家所钟爱，包括了构建良好的学习算法、误差函数和测试例程。在sklearn的核心有四种类型的类覆盖了所有机器学习功能：分类回归聚类分组转换数据虽然sklearn提供的算法比较多，但是他们都符合基本的接口定义，为了是使用不同的算法时，所使用的接口时统一的。sklearn提供了四个基本对象接口。评
机器学习入门--循环神经网络原理与实践 Dr.Cup 机器学习入门机器学习 rnn 深度学习
循环神经网络循环神经网络（RNN）是一种在序列数据上表现出色的人工神经网络。相比于传统前馈神经网络，RNN更加适合处理时间序列数据，如音频信号、自然语言和股票价格等。本文将介绍RNN的基本数学原理、使用PyTorch和Scikit-Learn数据集实现的代码。数学原理RNN是一种带有循环结构的神经网络，其在处理序列数据时将前一次的输出作为当前输入的一部分。这使得RNN能够记住先前的状态和信息，并且
【机器学习笔记】 6 机器学习库Scikit-learn RIKI_1 机器学习机器学习笔记 scikit-learn
Scikit-learn概述Scikit-learn是基于NumPy、SciPy和Matplotlib的开源Python机器学习包,它封装了一系列数据预处理、机器学习算法、模型选择等工具,是数据分析师首选的机器学习工具包。自2007年发布以来，scikit-learn已经成为Python重要的机器学习库了，scikit-learn简称sklearn，支持包括分类，回归，降维和聚类四大机器学习算法。
聚类分析入门：使用Python和K-means算法进行数据聚类 Evaporator Core python
文章标题：聚类分析入门：使用Python和K-means算法进行数据聚类简介聚类分析是机器学习中的一个重要任务，它涉及将数据集中的样本分成多个类别或簇，使得同一簇内的样本相似度较高，不同簇之间的样本相似度较低。K-means算法是一种常用的聚类算法，它通过迭代优化簇的中心点来实现聚类。本文将介绍如何使用Python编程语言和Scikit-learn库实现K-means算法，以及如何对数据进行聚类分
Python机器学习：Scikit-learn库与应用数据小爬虫 api 电商api 机器学习 python scikit-learn 开发语言运维服务器
当涉及到Python机器学习时，Scikit-learn是一个非常流行且功能强大的库。它提供了广泛的算法和工具，使得机器学习变得简单而高效。下面是一个简单的Scikit-learn库与应用示例，其中包括代码。首先，确保你已经安装了Scikit-learn库。你可以使用pip命令来安装它：bash复制代码pipinstallscikit-learn接下来，我们将使用Scikit-learn来执行一个
【机器学习算法】KNN鸢尾花种类预测案例和特征预处理。全md文档笔记（已分享，附代码）机器学习python算法
本系列文章md笔记（已分享）主要讨论机器学习算法相关知识。机器学习算法文章笔记以算法、案例为驱动的学习，伴随浅显易懂的数学知识，让大家掌握机器学习常见算法原理，应用Scikit-learn实现机器学习算法的应用，结合场景解决实际问题。包括K-近邻算法，线性回归，逻辑回归，决策树算法，集成学习，聚类算法。K-近邻算法的距离公式，应用LinearRegression或SGDRegressor实现回归预
基于聚类的点云背景分离算法python代码 love6a6 算法聚类 python
点云背景分离是一个常用的计算机视觉任务，它旨在从点云数据中分离出感兴趣的物体。聚类是一种常用的方法，可以通过将相似的点聚集在一起来完成背景分离。下面是一个简单的基于K-Means聚类的点云背景分离的Python代码示例，使用的是scikit-learn库：importnumpyasnpfromsklearn.clusterimportKMeansfromsklearn.preprocessingi
【机器学习】机器学习常见算法详解第4篇：KNN算法计算过程（已分享，附代码）机器学习python算法
本系列文章md笔记（已分享）主要讨论机器学习算法相关知识。机器学习算法文章笔记以算法、案例为驱动的学习，伴随浅显易懂的数学知识，让大家掌握机器学习常见算法原理，应用Scikit-learn实现机器学习算法的应用，结合场景解决实际问题。包括K-近邻算法，线性回归，逻辑回归，决策树算法，集成学习，聚类算法。K-近邻算法的距离公式，应用LinearRegression或SGDRegressor实现回归预
21丨朴素贝叶斯分类（下）：如何对文档进行分类？张九日zx
朴素贝叶斯分类最适合的场景就是文本分类、情感分析和垃圾邮件识别。sklearn机器学习包sklearn的全称叫Scikit-learn，它给我们提供了3个朴素贝叶斯分类算法，分别是高斯朴素贝叶斯（GaussianNB）、多项式朴素贝叶斯（MultinomialNB）和伯努利朴素贝叶斯（BernoulliNB）。自然界的现象比较适合用高斯朴素贝叶斯来处理，而文本分类是使用多项式朴素贝叶斯或者伯努利朴
Python的Sklearn库中的数据集王荣胜z
一、Sklearn介绍scikit-learn是Python语言开发的机器学习库，一般简称为sklearn，目前算是通用机器学习算法库中实现得比较完善的库了。其完善之处不仅在于实现的算法多，还包括大量详尽的文档和示例。其文档写得通俗易懂，完全可以当成机器学习的教程来学习。二、Sklearn数据集种类sklearn的数据集有好多个种自带的小数据集（packageddataset）：sklearn.d
Python数据科学：Scikit-Learn机器学习偶是不器 Python python 开发语言 scikit-learn 手写数字识别鸢尾花分类
4.1Scikit-Learn机器学习Scikit-Learn使用的数据表示：二维网格数据表实例1：通过Seaborn导入数据defskLearn():'''scikitLearn基本介绍:return:'''importseabornassns#导入Iris数据集#注：一般网络访问不了iris=sns.load_dataset('iris')iris.head()实例2：通过本地导入数据defs
[韩顺平]python笔记超级用户 root Python python 笔记开发语言
AI工程师、运维工程师python排名逐年上升，为什么？python对大数据分析、人工智能中关键的机器学习、深度学习都提供有力的支持Python支持最庞大的代码库，功能超强数据分析：numpy/pandas/os机器学习：tensorflow/scikit-learn/theano爬虫：urllib/reques/bs4/scrapy网页开发：Django/falsk/web运维：saltstac
神经网络中的分位数回归和分位数损失
在使用机器学习构建预测模型时，我们不只是想知道“预测值(点预测)”，而是想知道“预测值落在某个范围内的可能性有多大(区间预测)”。例如当需要进行需求预测时，如果只储备最可能的需求预测量，那么缺货的概率非常的大。但是如果库存处于预测的第95个百分位数(需求有95%的可能性小于或等于该值)，那么缺货数量会减少到大约20分之1。获得这些百分位数值的机器学习方法有：scikit-learn:Gradien
Python三维体素化网格和点云计算亚图跨际 Python 计算 python 点云三维数据
要点Python三维点云自动生成3D网格和表面重建，创建多个细节层次Python使用4种工具体素化网格，创建点云数据可视化Python计算RGB-D图像的点云，点云地面检测算法，过滤点云以便下采样和去除异常值，scikit-learn聚类点云数据Python点云和网格计算更多示例：使用泊松盘采样在网格上生成蓝噪声样本，对体素网格上的点云进行下采样，从点云估计法线，计算每个顶点的网格法线三维点云点云
Pycharm安装sklearn后，仍然报错No module named ‘sklearn‘ CRTao pycharm sklearn ide
Pycharm安装sklearn后，仍然报错Nomodulenamed‘sklearn’1.因为sklean是缩写，真正需要下载的包为scikit-learn2.下载scikit-learn不成功，可能是因为你之前安装过sklean，需要先把它卸载，然后再下载scikit-learn。3、下载scikit-learn还是不成功，需要检查一下你的pycharm的pip和电脑的pip版本是否相同，如果
机器学习--K近邻算法，以及python中通过Scikit-learn库实现K近邻算法API使用技巧景天科技苑机器学习机器学习 python 近邻算法
文章目录1.K-近邻算法思想2.K-近邻算法(KNN)概念3.电影类型分析4.KNN算法流程总结5.k近邻算法api初步使用机器学习库scikit-learn1Scikit-learn工具介绍2.安装3.Scikit-learn包含的内容4.K-近邻算法API5.案例5.1步骤分析5.2代码过程1.K-近邻算法思想假如你有一天来到北京，你有一些朋友也在北京居住，你来到北京之后，你也不知道你在北京的
LeetCode[Math] - #66 Plus One Cwind java LeetCode 题解 Algorithm Math
原题链接：#66 Plus One 要求：给定一个用数字数组表示的非负整数，如num1 = {1, 2, 3, 9}, num2 = {9, 9}等，给这个数加上1。注意： 1. 数字的较高位存在数组的头上，即num1表示数字1239 2. 每一位（数组中的每个元素）的取值范围为0~9 难度：简单分析：题目比较简单，只须从数组
JQuery中$.ajax()方法参数详解 AILIKES JavaScript jsonp jquery Ajax json
url: 要求为String类型的参数，（默认为当前页地址）发送请求的地址。 type: 要求为String类型的参数，请求方式（post或get）默认为get。注意其他http请求方法，例如put和 delete也可以使用，但仅部分浏览器支持。 timeout: 要求为Number类型的参数，设置请求超时时间（毫秒）。此设置将覆盖$.ajaxSetup()方法的全局
JConsole & JVisualVM远程监视Webphere服务器JVM Kai_Ge JVisualVM JConsole Webphere
JConsole是JDK里自带的一个工具，可以监测Java程序运行时所有对象的申请、释放等动作，将内存管理的所有信息进行统计、分析、可视化。我们可以根据这些信息判断程序是否有内存泄漏问题。　　使用JConsole工具来分析WAS的JVM问题，需要进行相关的配置。　　首先我们看WAS服务器端的配置. 　　1、登录was控制台https://10.4.119.18
自定义annotation 120153216 annotation
Java annotation 自定义注释@interface的用法一、什么是注释说起注释，得先提一提什么是元数据(metadata)。所谓元数据就是数据的数据。也就是说，元数据是描述数据的。就象数据表中的字段一样，每个字段描述了这个字段下的数据的含义。而J2SE5.0中提供的注释就是java源代码的元数据，也就是说注释是描述java源
CentOS 5/6.X 使用 EPEL YUM源 2002wmj centos
CentOS 6.X 安装使用EPEL YUM源1. 查看操作系统版本[root@node1 ~]# uname -a Linux node1.test.com 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [root@node1 ~]#
在SQLSERVER中查找缺失和无用的索引SQL 357029540 SQL Server
--缺失的索引 SELECT avg_total_user_cost * avg_user_impact * ( user_scans + user_seeks ) AS PossibleImprovement , last_user_seek ,
Spring3 MVC 笔记（二） —json+rest优化 7454103 Spring3 MVC
接上次的 spring mvc 注解的一些详细信息！其实也是一些个人的学习笔记呵呵！
替换“\”的时候报错Unexpected internal error near index 1 \ ^ adminjun java “\替换”
发现还是有些东西没有刻子脑子里,,过段时间就没什么概念了,所以贴出来...以免再忘... 在拆分字符串时遇到通过 \ 来拆分，可是用所以想通过转义 \\ 来拆分的时候会报异常 public class Main { /*
POJ 1035 Spell checker(哈希表) aijuans 暴力求解--哈希表
/* 题意：输入字典，然后输入单词，判断字典中是否出现过该单词，或者是否进行删除、添加、替换操作，如果是，则输出对应的字典中的单词要求按照输入时候的排名输出题解：建立两个哈希表。一个存储字典和输入字典中单词的排名，一个进行最后输出的判重 */ #include <iostream> //#define using namespace std; const int HASH =
通过原型实现javascript Array的去重、最大值和最小值 ayaoxinchao JavaScript array prototype
用原型函数（prototype）可以定义一些很方便的自定义函数，实现各种自定义功能。本次主要是实现了Array的去重、获取最大值和最小值。实现代码如下： <script type="text/javascript"> Array.prototype.unique = function() { var a = {}; var le
UIWebView实现https双向认证请求 bewithme UIWebView https Objective-C
什么是HTTPS双向认证我已在先前的博文 ASIHTTPRequest实现https双向认证请求中有讲述，不理解的读者可以先复习一下。本文是用UIWebView来实现对需要客户端证书验证的服务请求，网上有些文章中有涉及到此内容，但都只言片语，没有讲完全，更没有完整的代码，让人困扰不已。但是此知
NoSQL数据库之Redis数据库管理(Redis高级应用之事务处理、持久化操作、pub_sub、虚拟内存) bijian1013 redis 数据库 NoSQL
3.事务处理 Redis对事务的支持目前不比较简单。Redis只能保证一个client发起的事务中的命令可以连续的执行，而中间不会插入其他client的命令。当一个client在一个连接中发出multi命令时，这个连接会进入一个事务上下文，该连接后续的命令不会立即执行，而是先放到一个队列中，当执行exec命令时，redis会顺序的执行队列中
各数据库分页sql备忘 bingyingao oracle sql 分页
ORACLE 下面这个效率很低 SELECT * FROM ( SELECT A.*, ROWNUM RN FROM (SELECT * FROM IPAY_RCD_FS_RETURN order by id desc) A ) WHERE RN <20; 下面这个效率很高 SELECT A.*, ROWNUM RN FROM (SELECT * FROM IPAY_RCD_
【Scala七】Scala核心一：函数 bit1129 scala
1. 如果函数体只有一行代码，则可以不用写{},比如 def print(x: Int) = println(x) 一行上的多条语句用分号隔开，则只有第一句属于方法体，例如 def printWithValue(x: Int) : String= println(x); "ABC" 上面的代码报错，因为，printWithValue的方法
了解GHC的factorial编译过程 bookjovi haskell
GHC相对其他主流语言的编译器或解释器还是比较复杂的，一部分原因是haskell本身的设计就不易于实现compiler，如lazy特性，static typed，类型推导等。关于GHC的内部实现有篇文章说的挺好，这里，文中在RTS一节中详细说了haskell的concurrent实现，里面提到了green thread，如果熟悉Go语言的话就会发现，ghc的concurrent实现和Go有点类
Java-Collections Framework学习与总结-LinkedHashMap BrokenDreams LinkedHashMap
前面总结了java.util.HashMap，了解了其内部由散列表实现，每个桶内是一个单向链表。那有没有双向链表的实现呢？双向链表的实现会具备什么特性呢？来看一下HashMap的一个子类——java.util.LinkedHashMap。
读《研磨设计模式》-代码笔记-抽象工厂模式-Abstract Factory bylijinnan abstract
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ package design.pattern; /* * Abstract Factory Pattern * 抽象工厂模式的目的是： * 通过在抽象工厂里面定义一组产品接口，方便地切换“产品簇” * 这些接口是相关或者相依赖的
压暗面部高光 cherishLC PS
方法一、压暗高光&重新着色当皮肤很油又使用闪光灯时，很容易在面部形成高光区域。下面讲一下我今天处理高光区域的心得：皮肤可以分为纹理和色彩两个属性。其中纹理主要由亮度通道（Lab模式的L通道）决定，色彩则由a、b通道确定。处理思路为在保持高光区域纹理的情况下，对高光区域着色。具体步骤为：降低高光区域的整体的亮度，再进行着色。如果想简化步骤，可以只进行着色（参看下面的步骤1
Java VisualVM监控远程JVM crabdave visualvm
Java VisualVM监控远程JVM JDK1.6开始自带的VisualVM就是不错的监控工具. 这个工具就在JAVA_HOME\bin\目录下的jvisualvm.exe, 双击这个文件就能看到界面通过JMX连接远程机器, 需要经过下面的配置: 1. 修改远程机器JDK配置文件 (我这里远程机器是linux).
Saiku去掉登录模块 daizj saiku 登录 olap BI
1、修改applicationContext-saiku-webapp.xml <security:intercept-url pattern="/rest/**" access="IS_AUTHENTICATED_ANONYMOUSLY" /> <security:intercept-url pattern=&qu
浅析 Flex中的Focus dsjt html Flex Flash
关键字：focus、 setFocus、 IFocusManager、KeyboardEvent 焦点、设置焦点、获得焦点、键盘事件一、无焦点的困扰——组件监听不到键盘事件原因：只有获得焦点的组件（确切说是InteractiveObject）才能监听到键盘事件的目标阶段；键盘事件（flash.events.KeyboardEvent）参与冒泡阶段，所以焦点组件的父项（以及它爸
Yii全局函数使用 dcj3sjt126com yii
由于YII致力于完美的整合第三方库，它并没有定义任何全局函数。yii中的每一个应用都需要全类别和对象范围。例如，Yii::app()->user;Yii::app()->params['name'];等等。我们可以自行设定全局函数，使得代码看起来更加简洁易用。(原文地址) 我们可以保存在globals.php在protected目录下。然后，在入口脚本index.php的，我们包括在
设计模式之单例模式二（解决无序写入的问题） come_for_dream 单例模式 volatile 乱序执行双重检验锁
在上篇文章中我们使用了双重检验锁的方式避免懒汉式单例模式下由于多线程造成的实例被多次创建的问题，但是因为由于JVM为了使得处理器内部的运算单元能充分利用，处理器可能会对输入代码进行乱序执行（Out Of Order Execute）优化，处理器会在计算之后将乱序执行的结果进行重组，保证该
程序员从初级到高级的蜕变 gcq511120594 框架工作 PHP android html5
软件开发是一个奇怪的行业，市场远远供不应求。这是一个已经存在多年的问题，而且随着时间的流逝，愈演愈烈。我们严重缺乏能够满足需求的人才。这个行业相当年轻。大多数软件项目是失败的。几乎所有的项目都会超出预算。我们解决问题的最佳指导方针可以归结为——“用一些通用方法去解决问题，当然这些方法常常不管用，于是，唯一能做的就是不断地尝试，逐个看看是否奏效”。现在我们把淫浸代码时间超过3年的开发人员称为
Reverse Linked List hcx2013 list
Reverse a singly linked list. /** * Definition for singly-linked list. * public class ListNode { * int val; * ListNode next; * ListNode(int x) { val = x; } * } */ p
Spring4.1新特性——数据库集成测试 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
C# Ajax上传图片同时生成微缩图(附Demo) liyonghui160com
1.Ajax无刷新上传图片,详情请阅我的这篇文章。（jquery + c# ashx） 2.C#位图处理 System.Drawing。 3.最新demo支持IE7,IE8,Fir
Java list三种遍历方法性能比较 pda158 java
从c/c++语言转向java开发，学习java语言list遍历的三种方法，顺便测试各种遍历方法的性能，测试方法为在ArrayList中插入1千万条记录，然后遍历ArrayList，发现了一个奇怪的现象，测试代码例如以下： package com.hisense.tiger.list; import java.util.ArrayList; import java.util.Iterator;
300个涵盖IT各方面的免费资源（上）——商业与市场篇 shoothao seo 商业与市场 IT资源免费资源
A.网站模板+logo+服务器主机+发票生成 HTML5 UP:响应式的HTML5和CSS3网站模板。 Bootswatch:免费的Bootstrap主题。 Templated:收集了845个免费的CSS和HTML5网站模板。 Wordpress.org|Wordpress.com:可免费创建你的新网站。 Strikingly:关注领域中免费无限的移动优
localStorage、sessionStorage uule localStorage
W3School 例子 HTML5 提供了两种在客户端存储数据的新方法： localStorage - 没有时间限制的数据存储 sessionStorage - 针对一个 session 的数据存储之前，这些都是由 cookie 完成的。但是 cookie 不适合大量数据的存储，因为它们由每个对服务器的请求来传递，这使得 cookie 速度很慢而且效率也不