Gradient Descent

Gradient descent

cost function: for example, MSE(Mean Square Error) can be expressed as ​. To be more generally, ​ . Its gradient can be formulated as

The calculation of gradient has to iterate all samples and sum them together. If the number of samples is very large, the calculation is very time-consuming.

So, to overcome this problem we need to divide the data into smaller sizes and give it to our computer one by one and update the weights of the neural networks at the end of every step to fit it to the data given.

  • iterative

  • learning rate

sgd.gif
  • Batch Gradient Descent

    iterate all samples at once

  • Stochastic Gradient Descent

    one sample at each iterate

  • Mini-Batch Gradient Descent

    balance between the Batch gradient descent and stochastic gradient descent. The samples are divided into several batches, and all of the batches comprise one epoch.

你可能感兴趣的:(Gradient Descent)