Lecture 4: Scalable logistic regression

Lecture 4: Scalable logistic regression

1. Probabilistic classifier

A logistic regression model is an example of a probabilistic classifier

2. Logistic regression

We model the relationship between y and x using a Bernoulli distribution:
P(Y = y) = ber(y|u) = u^y * (1-u)^(1-y)

In logistic regression, the probability u(x) is given as the logistic sigmoid function:

u(x) = 1/(1+exp(-wx))

3. Optimization routines

  • Gradient(steepest) descent: the main issue in gradient desent is how to set the step size
  • Conjugate gradient:
  • NewTon’s method: faster optimisation algorithm by taking the curvature(曲率) of the space. The condition holds if the function is strictly convex
  • Levenberg Marquardt algorithm: if the function is not strictly convex. compromises between the Newton direction and the steepest direction
  • Quasi-Newton methods: to solve the problem that Newton direction need Hessian computing which is a cumbersome, error-prone, and expensive process
  • Limited memory BFGS: ignoring older information and perserve the most recent pairs of data

4. How to compute the gradient and the Hessian

Batch gradient desent: distribute the caculations over multiple workers(cores)
Stochastic Gradient Descent(SGD): Mini-batch gradient descent. Using subset of data to compute the gradient

The step size in SDG should follow the Robbins-Monro conditions.

5. Regularisation

regParam: the regularisation parameter, if =0, no regularisation
elasticNetParam: the weight of l1 and l2 regularisation

6. Logistic regression in Pyspark

Driver/controller: Initialize Weights, Broadcast Weights to Execytors,
Workers/Executors: Compute loss and gradient for each sample and sim them locally
Driver/controller: Reduce from executors to get the total sum of losses and gradients, Handle regularization and use LBFGS/OWLQN to update weights

L-BFGS is used as a solver for LogisticRegression() with l2 regularisation
OWLQN is used as a solver when l1 or elastic Net are used

你可能感兴趣的:(逻辑回归,机器学习,人工智能)