base_lr:初始学习率

momentum:上一次梯度权重

weight_decay:正则项系数

以上三个参数是SGD的核心,关于base_lr和momentum见:http://caffe.berkeleyvision.org/tutorial/solver.html

关于weight_decay: http://stats.stackexchange.com/questions/29130/difference-between-neural-net-weight-decay-and-learning-rate

lr_policy:(gamma、power、step)学习率更新规则,见caffe代码

// Return the current learning rate. The currently implemented learning rate
// policies are as follows:
//    - fixed: always return base_lr.
//    - step: return base_lr * gamma ^ (floor(iter / step))
//    - exp: return base_lr * gamma ^ iter
//    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
//    - multistep: similar to step but it allows non uniform steps defined by
//      stepvalue
//    - poly: the effective learning rate follows a polynomial decay, to be
//      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
//    - sigmoid: the effective learning rate follows a sigmod decay
//      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
//

 

lr_mult:每一层都有两个lr_mult参数代表本层的学习率,第一个是base_lr*lr_mult代表本层样本,第二个是bias 的学习率

xavier:初始化参数,trick,见Understanding the difficulty of training deep feedforward neural networks