机器学习:Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
AbstractAlthoughstochasticgradientdescent(SGD)methodanditsvariants(e.g.,stochasticmomentummethods,ADAGRAD)arethechoiceofalgorithmsforsolvingnon-convexproblems