一、回归——sklearn.linear_model.LogisticRegression官方文档

1.logistic回归

logistic回归可以用于概率预测、分类等。

2.sklearn.linear_model.LogisticRegression函数参数

LogisticRegression(penalty=’l2’dual=Falsetol=0.0001C=1.0fit_intercept=Trueintercept_scaling=1class_weight=Nonerando

m_state=Nonesolver=’liblinear’max_iter=100multi_class=’ovr’verbose=0warm_start=Falsen_jobs=1)[source]

注意:

(1)在多类别划分中,如果' multi_class '选项设置为' OvR ',训练算法将使用one-vs-rest (OvR)方案;如果' multi_class '选项设置为'多项式',则使用交叉熵损失。

(2)这个类使用‘liblinear’ library, ‘newton-cg’, ‘sag’ 和‘lbfgs’求解器实现了规范化的logistic回归。它的输入矩阵可以是密集和稀疏的矩阵;使用C-ordered arrays or CSR matrices containing 64-bit floats可以获得最佳的性能;

(3)‘newton-cg’, ‘sag’, and ‘lbfgs求解器只支持原始公式下的L2正则化;liblinear求解器同时支持L1和L2的正则化,只对L2处罚采用对偶公式。

参数说明:

 

  1. penalty : str, ‘l1’ or ‘l2’, default: ‘l2’。用来指明惩罚的标准,The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers仅支持l2 penalties.
  2. dual : bool, default: False。Dual or primal formulation.Dual formulation只适用于 l2 penalty with liblinear solver.Prefer dual=False when n_samples > n_features.
  3. tol : float, default: 1-4。Tolerance for stopping criteria.
  4. C : float, default: 1.0。逆正则化的强度,一定要是整数,就像支持向量机一样,较小的值有职责更好的正则化。
  5. fit_intercept : bool, default: True。是否存在截距,默认存在。
  6. class_weight : dict or ‘balanced’, default: None。用于标示分类模型中各种类型的权重,可以不输入,即不考虑权重。如果输入的话可以调用balanced库计算权重,或者是手动输入各类的权重。比如对于0,1的二元模型,我们可以定义class_weight={0:0.9, 1:0.1},这样类型0的权重为90%,而类型1的权重为10%
  7. random_state随机数种子,默认为无,仅在正则化优化算法为sag,liblinear时有用。
  8. max_iter : int, default: 100。Useful only for the newton-cg, sag and lbfgs solvers,求解的最大迭代次数
  9. multi_class : str, {‘ovr’, ‘multinomial’}, default: ‘ovr’。多类别问题的处理方式。'ovo':一对一
  10. verbose 日志冗长度int:冗长度0:不输出训练过程1:偶尔输出 >1:对每个子模型都输出
  11. warm_start : bool, default: False。是否热启动,如果是,则下一次训练是以追加树的形式进行(重新使用上一次的调用作为初始化),bool:热启动False:默认值
  12. n_jobs:并行数,int:个数-1:跟CPU核数一致1:默认值
  13. solver : {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’},default: ‘liblinear’ Algorithm to use in the optimization problem.
  •    对于小的数据集,, 选择 ‘liblinear’较好,‘sag’ 和‘saga’ 对于大数据集更好;
  • 对于多级分类的问题,只有‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’,libniear只支持多元逻辑回归的OvR,不支持MvM,但MVM相对精确。
  • newton-cg’, ‘lbfgs’ and ‘sag’ only handle L2 penalty, 相反地‘liblinear’ and ‘saga’ handle L1 penalty.

3.Attibutes

 

  1. coef_ : 变量中的系数。shape (1, n_features) or (n_classes, n_features)
  2. intercept_ 截距。shape (1,) or (n_classes,)
  3. n_iter_ :所有类的实际迭代次数。shape (n_classes,) or (1, )

4.Methods

  • (1)decision_function(X):预测样本的 confidence scores
  • (2)densify():将系数矩阵转化成密集矩阵的格式
  • (3)fit(X, y[, sample_weight]):根据给出的训练数据来训练模型。用来训练LR分类器,其中X是训练样本,y是对应的标记样本。
  • (4)get_params([deep]):Get parameters for this estimator.
  • (5)predict(X):用来预测测试样本的标记,也就是分类。预测x的标签
  • (6)predict_log_proba(X):对数概率估计
  • (7)predict_proba(X):概率估计
  • (8)score(X, y[, sample_weight]):返回给定的测试数据和标签的平均精度
  • (9)set_params(**params):设置estimate的参数
  • (10)sparsify():将系数矩阵转换成稀疏矩阵格式。

5.Methods的参数

(1)fit(Xysample_weight=None)

Fit the model according to the given training data.

Parameters:

X : {array-like, sparse matrix}, shape (n_samples, n_features) 训练样本

Training vector, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape (n_samples,)

Target vector relative to X.标签

sample_weight : array-like, shape (n_samples,) optional

Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.每个样本的权重,如果为空,则每个样本的权重相同

Returns:

self : object

Returns self. 拟合模型,用来训练LR分类器,其中X是训练样本,y是对应的标记向量


(2)get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.


(3)predict(X)     Predict class labels for samples in X.

Parameters:

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Samples.

Returns:

C : array, shape = [n_samples]

Predicted class label per sample.

(4)predict_log_proba(X)  Log of probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters:

X : array-like, shape = [n_samples, n_features]

Returns:

T : array-like, shape = [n_samples, n_classes]

Returns the log-probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

(5)predict_proba(X)  Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.

Parameters:

X : array-like, shape = [n_samples, n_features]

Returns:

T : array-like, shape = [n_samples, n_classes]

Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.

(6)score(Xysample_weight=None):[source]
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:

score : float

Mean accuracy of self.predict(X) wrt. y.

你可能感兴趣的:(机器学习算法——回归)