sklearn.neural_network.MLPRegressor参数介绍

sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(100, ), activation=‘relu’, solver=‘adam’, alpha=0.0001, batch_size=‘auto’, learning_rate=‘constant’, learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)

hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)

The ith element represents the number of neurons in the ith hidden layer.

隐藏层的数量,以及神经元的数量。

activation : {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’

Activation function for the hidden layer.

‘identity’, no-op activation, useful to implement linear bottleneck, returns f(x) = x

‘identity’,无操作激活,对实现线性瓶颈很有用,返回f(x)= x

‘logistic’, the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).

logistic函数,用sigmoid函数

‘tanh’, the hyperbolic tan function, returns f(x) = tanh(x).

双曲线tan函数

‘relu’, the rectified linear unit function, returns f(x) = max(0, x)

修正线性单元函数,也是非线性的,

隐藏层的激活函数

https://blog.csdn.net/zchang81/article/details/70224688激活函数使用建议

线性的缺点是,线性函数的组合仍然是线性函数,意味着无论堆积多少层网络,如果每一层都使用线性激活函数,那这些曾最终等效于一层。

solver : {‘lbfgs’, ‘sgd’, ‘adam’}, default ‘adam’

The solver for weight optimization.

‘lbfgs’ is an optimizer in the family of quasi-Newton methods.

'lbfgs’是准牛顿方法族的优化者。

‘sgd’ refers to stochastic gradient descent.

随机梯度下降

‘adam’ refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba

          'adam'是指由Kingma,Diederik和Jimmy Ba提出的基于随机梯度的优化器

Note: The default solver ‘adam’ works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.

解法

L2惩罚系数 alpha : float, optional, default 0.0001

L2 penalty (regularization term) parameter.

batch_size : int, optional, default ‘auto’

Size of minibatches for stochastic optimizers. If the solver is ‘lbfgs’, the classifier will not use minibatch. When set to “auto”, batch_size=min(200, n_samples)

用于随机优化器的小型机的大小。 如果求解器是’lbfgs’,则分类器将不使用minibatch。 设置为“auto”时,batch_size = min(200,n_samples)

learning_rate : {‘constant’, ‘invscaling’, ‘adaptive’}, default ‘constant’

Learning rate schedule for weight updates.权重更新的学习率计划

‘constant’ is a constant learning rate given by ‘learning_rate_init’.'constant’是’learning_rate_init’给出的恒定学习率。

‘invscaling’ gradually decreases the learning rate learning_rate_ at each time step ‘t’ using an inverse scaling exponent of ‘power_t’. effective_learning_rate = learning_rate_init / pow(t, power_t)

'invscaling’使用’power_t’的逆缩放指数在每个时间步’t’逐渐降低学习速率learning_rate_。 effective_learning_rate = learning_rate_init / pow(t,power_t)

‘adaptive’ keeps the learning rate constant to ‘learning_rate_init’ as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if ‘early_stopping’ is on, the current learning rate is divided by 5.

只要训练损失不断减少,“adaptive”将学习速率保持为“learning_rate_init”。 每当两个连续的时期未能将训练损失减少至少tol,或者如果’early_stopping’开启则未能将验证分数增加至少tol,则将当前学习速率除以5。

Only used when solver=’sgd’.

learning_rate_init : double, optional, default 0.001

The initial learning rate used. It controls the step-size in updating the weights. Only used when solver=’sgd’ or ‘adam’.使用初始学习率。 它控制更新权重的步长。 仅在solver ='sgd’或’adam’时使用。

power_t : double, optional, default 0.5

The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to ‘invscaling’. Only used when solver=’sgd’.

反缩放学习率的指数。 当learning_rate设置为“invscaling”时,它用于更新有效学习率。 仅在solver ='sgd’时使用。

max_iter : int, optional, default 200

Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

最大迭代次数。 求解器迭代直到收敛(由’tol’确定)或这个迭代次数。 对于随机解算器(‘sgd’,‘adam’),请注意,这决定了时期的数量(每个数据点的使用次数),而不是梯度步数。

shuffle : bool, optional, default True

Whether to shuffle samples in each iteration. Only used when solver=’sgd’ or ‘adam’.

是否在每次迭代中对样本进行洗牌。 仅在solver ='sgd’或’adam’时使用。

random_state : int, RandomState instance or None, optional, default None

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

如果是int,则random_state是随机数生成器使用的种子; 如果是RandomState实例,则random_state是随机数生成器; 如果为None,则随机数生成器是np.random使用的RandomState实例。

tol : float, optional, default 1e-4

Tolerance for the optimization. When the loss or score is not improving by at least tol for two consecutive iterations, unless learning_rate is set to ‘adaptive’, convergence is considered to be reached and training stops.

优化容差。 当损失或分数在两次连续迭代中没有提高至少tol时,除非将learning_rate设置为“自适应”,否则认为会达到收敛并且训练停止。

verbose : bool, optional, default False

Whether to print progress messages to stdout.

是否将进度消息打印到stdout。

warm_start : bool, optional, default False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

设置为True时,重用上一次调用的解决方案以适合初始化,否则,只需擦除以前的解决方案。

momentum : float, default 0.9

Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.

梯度下降更新的动量。 应该在0和1之间。仅在solver ='sgd’时使用。

nesterovs_momentum : boolean, default True

Whether to use Nesterov’s momentum. Only used when solver=’sgd’ and momentum > 0.

是否使用Nesterov的势头。 仅在solver ='sgd’和momentum> 0时使用。

early_stopping : bool, default False

Whether to use early stopping to terminate training when validation score is not improving. If set to true, it will automatically set aside 10% of training data as validation and terminate training when validation score is not improving by at least tol for two consecutive epochs. Only effective when solver=’sgd’ or ‘adam’

当验证评分没有改善时,是否使用提前停止来终止培训。 如果设置为true,它将自动留出10%的训练数据作为验证,并在验证得分没有提高至少两个连续时期的tol时终止训练。 仅在solver ='sgd’或’adam’时有效

validation_fraction : float, optional, default 0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True

将训练数据的比例留作早期停止的验证集。 必须介于0和1之间。仅在early_stopping为True时使用

beta_1 : float, optional, default 0.9

Exponential decay rate for estimates of first moment vector in adam, should be in [0, 1). Only used when solver=’adam’

亚当中第一时刻向量估计的指数衰减率应为[0,1]。 仅在solver ='adam’时使用

beta_2 : float, optional, default 0.999

Exponential decay rate for estimates of second moment vector in adam, should be in [0, 1). Only used when solver=’adam’

亚当中二阶矩矢量估计的指数衰减率应为[0,1]。 仅在solver ='adam’时使用

epsilon : float, optional, default 1e-8

Value for numerical stability in adam. Only used when solver=’adam’

亚当数值稳定性的价值。 仅在solver ='adam’时使用

作者:Jasmine晴天和我
链接:https://www.jianshu.com/p/619ad61bcb07
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

你可能感兴趣的:(深度学习)