深度学习中定义的损失函数基本上都是极度非凸的函数,仅使用梯度下降法(SGD)很容易陷入局部最优解,本系列打算讲解以下方法:

1、SGD (On the importance of initialization and momentum in deep learning)

2、momentum

3、Nesterov accelerated gradient

4、Adagrad (Adaptive Subgradient Methods for online learning and stochastic optimization)

5、RMSprop (Genderating Sequences with recurrent neural networks)

6、Rprop (resilient backpropagation algorithm)

7、Adadelta (Adadelta: an adaptive learning rate method)

8、Adam (A method for stochastic optimization)

9、AMSGrad (On the convergence of Adam and Beyond)

10、AdaBound (Adaptive gradient methods with dynamic bound of learning rate)