cs231n

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture6.pdf

sigmoid

  • Saturated neurons “kill” the gradients
  • Sigmoid outputs are not zero-centered
  • exp() is a bit compute expensive

tanh

  • Squashes numbers to range [-1,1]
  • zero centered (nice)
  • Saturated neurons “kill” the gradients

ReLU

  • Does not saturate (in +region)
  • Very computationally efficient
  • Converges much faster than sigmoid/tanh in practice (e.g. 6x)
  • Actually more biologically plausible than sigmoid

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture7.pdf

  • Adam is a good default choice in most cases
  • If you can afford to do full batch updates then try out L-BFGS (and don’t forget to disable all sources of noise)

你可能感兴趣的:(cs231n)