深度学习中定义的损失函数基本上都是极度非凸的函数,仅使用梯度下降法(SGD)很容易陷入局部最优解,本系列打算讲解以下方法:
1、SGD (On the importance of initialization and momentum in deep learning)
2、momentum
3、Nesterov accelerated gradient
4、Adagrad (Adaptive Subgradient Methods for online learning and stochastic optimization)
5、RMSprop (Genderating Sequences with recurrent neural networks)
6、Rprop (resilient backpropagation algorithm)
7、Adadelta (Adadelta: an adaptive learning rate method)
8、Adam (A method for stochastic optimization)
9、AMSGrad (On the convergence of Adam and Beyond)
10、AdaBound (Adaptive gradient methods with dynamic bound of learning rate)
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:深度学习系列专题之优化方法(1)总览 - Python技术站