caffe调loss方法

2023年4月7日下午11:53 • Caffe

正文

what should I do if...
...my loss diverges? (increases by order of magnitude, goes to inf. or NaN)
lower the learning rate
raise momentum (with corresponding learning rate drop)
raise weight decay
raise batch size
use gradient clipping (limit the L2 norm of the gradient to a particular value at each iteration; shrink it to that norm if greater)
try another solver: momentum SGD, ADAM, RMSProp, ...
try a smaller initialization (e.g., for a Gaussian init., lower the stdev.)

what should I do if...
...my loss doesn’t improve / gets stuck / drops slowly?

raise the learning rate
(maybe) lower momentum, weight decay, and/or batch size
try another solver: momentum SGD, ADAM, RMSProp, ...
transfer a pre-trained (e.g. on ImageNet) initialization, if possible
use a larger initialization (in particular, make sure you didn’t zero-initialize any multiplicative weights in intermediate layers)
use a “smarter” initialization (e.g., for linear layers followed by ReLUs, try the msra initialization in Caffe)
remove some layers to make the network shallower
at least to start!
a strategy for model design: begin with a simple, trainable network; “deepen” it by adding new layers one-by-one

-modify the architecture to improve gradient flow:
batch normalization
residual learning [ResNet]
intermediate losses [GoogLeNet]
other tricks

be patient! (go outside?)
deep learning can take a long time
training AlexNet in 2012: 12 days
although this is down to 1 day in 2015!
loss hovers around the chance value of ln(1000) ≅ 6.908 for the first 1000+ iterations (~1 hour on 2012 GPU)
training ResNet-152 in 2015: 1-2 months (on 8 GPUs!)
the best configurations (net architectures, solvers) at convergence are often not the ones that train fastest early on
some tricks to speed up learning can be “greedy” rather than ultimately beneficial

补充一个：如果显存不够，考虑设定iter_size来增大batch_size

reference

https://docs.google.com/presentation/d/1HxGdeq8MPktHaPb-rlmYYQ723iWzq9ur6Gjo71YiG0Y/edit#slide=id.g8629ab2c8_0_60

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：caffe调loss方法 - Python技术站

Caffe 人工智能

赞 (0)

微信扫一扫

微信扫一扫

支付宝扫一扫

支付宝扫一扫

win10编译caffe跑faster-rcnn(cuda7.5)

上一篇 2023年4月7日

caffe RandomBrightness和RandomContrast

下一篇 2023年4月7日

GAN生成对抗网络

深度学习之GAN生成对抗网络

前言近年来，基于数据而习得“特征”的深度学习技术受到狂热追捧，而其中GAN模型训练方法更加具有激进意味：它生成数据本身。　　GAN是“生成对抗网络”（Generative Adversarial Networks）的简称，由2014年还在蒙特利尔读博士的Ian Goodfellow引入深度学习领域。2016年，GAN热潮席卷AI领域顶级会议，从ICLR到N…

2023年4月6日
000
Keras实现text classification文本二分类

1，获取数据 imdb = keras.datasets.imdb(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000) 2，查看处理变形数据 2.1，查看 print(train_data[0]) len(train_data[0]), l…

Keras 2023年4月6日
000
在Keras中用Bert进行情感分析

之前在BERT实战——基于Keras一文中介绍了两个库 keras_bert 和 bert4keras 但是由于 bert4keras 处于开发阶段，有些函数名称和位置等等发生了变化，那篇文章只用了 bert4keras 进行情感分析于是这里新开了一篇文章将 2 个库都用一遍， bert4keras 也使用最新版本本文所用bert4keras时间：201…

Keras 2023年4月7日
000
循环神经网络

循环神经网络（转载）

循环神经网络(RNN, Recurrent Neural Networks)介绍这篇文章很多内容是参考：http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/，在这篇文章中，加入了一些新的内容与一些自己的理解。循环…

2023年4月5日
000
卷积神经网络

TensorFlow(十)：卷积神经网络实现手写数字识别以及可视化

上代码： import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets(\’MNIST_data\’,one_hot=True) #每个批次的大小 batch_size = 100 #计算…

2023年4月8日
000
目标检测

目标检测——Fast RCNN原理

Fast-RCNN——Ross Girshick 文章目录 Fast-RCNN——Ross Girshick 简介基本原理基本结构 ROI POOLING 参数初始化 SVD（singular value decomposition）分类和定位 training test time Multi-task Loss 简介 Fast-RCNN是 Ross …

2023年4月8日
000
阅读文献《DCRNet：Dilated Convolution based CSI Feedback Compression for Massive MIMO Systems》

这篇文章的作者是广州大学的范立生老师和他的学生汤舜璞，于2022年10月发表在 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY。文献提出了一种基于空洞卷积（Dilated Convolution）的CSI反馈网络，即空洞信道重建网络(Dilated Channel Reconstruction Network, DCRN…

人工智能概论 2023年4月30日
000
【Magenta 项目初探】手把手教你用Tensorflow神经网络创造音乐

原文链接：http://www.cnblogs.com/learn-to-rock/p/5677458.html 偶然在网上看到了一个让我很感兴趣的项目 Magenta，用Tensorflow让神经网络自动创造音乐。白话就是：可以用一些音乐的风格来制作模型，然后用训练出的模型对新的音乐进行加工从而创造出新的音乐。花了半天时间捣鼓终于有了成果，挺开心的，同…

tensorflow 2023年4月6日
000

合作推广

合作推广

返回顶部