网络参数

# 测试总数/batchsize
test_iter: 100
# 测试间隔
test_interval: 500
# 开始的学习率
base_lr: 0.01
# 冲量单元,用于加速收敛,v(t+1)=momentum*v(t)-lr*grad ; w(t+1)=w(t)+v(t+1)
momentum: 0.9
# 权值衰减,用于惩罚项
weight_decay: 0.0005
# 学习率下降策略,此处计算方式为base_lr * (1 + gamma * iter) ^ (- power)
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# 每display次打印显示loss
display: 100
# train 最大迭代max_iter 
max_iter: 10000
# 每迭代snapshot次,保存一次快照
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# 使用CPU还是GPU
solver_mode: GPU

 

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}

name: 该层的名称。

type: 层类型,如果是Data,表示数据来源于LevelDB或LMDB。

top/bottom: 输出/ 输入。(data,label)配对作为输入数据进行分类。

include: 属于训练、测试或者两者均含。

Transform_param: 将数据变换到定义的范围。0.00390625指1/255。

source: 数据来源。

batch_size: 每次处理的数据个数。

backend: LevelDB/LMDB。

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

lr_mult: 学习率的系数,将乘以solver.prototxt配置文件中的base_lr。两个lr_mult对应两个参数。

num_output: 卷积核的个数。

kernel_size: 卷积核的大小。

stride: 卷积核的步长,默认为1。

pad: 扩充边缘,默认为0。

weight_filler: 权值初始化。 默认为“constant",值全为0,很多时候我们用"xavier"算法来进行初始化,也可以设置为”gaussian"。

bias_filler: 偏置项的初始化。一般设置为"constant",值全为0。

layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

pool: 池化方法。

pad: 边缘扩充,默认为0。

kernel_size: 池化的核大小。

stride: 池化的步长,默认为1。设置为与kernel_size一样,即不重叠。
 
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

全连接层也可以理解为一种卷积层,卷积核大小和原数据大小一致。

num_output: 输出/卷积核的个数。

 

自定义网络

在之前的随笔基于theano的深度卷积神经网络中有一个神经网络。

此处用caffe训练。

主要参数有:

net = Network([
ConvPoolLayer(image_shape=(mini_batch_size, 1, 28, 28),
filter_shape=(20, 1, 5, 5),
poolsize=(2, 2), activation_fn=ReLU),
ConvPoolLayer(image_shape=(mini_batch_size, 20, 12, 12),
filter_shape=(40, 20, 5, 5),
poolsize=(2, 2)), activation_fn=ReLU),
FullyConnectedLayer(n_in=40*4*4, n_out=100, activation_fn=sigmoid),
SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size=10)
net.SGD(training_data, 30, mini_batch_size=10, 0.1,
validation_data, test_data)

lr保持0.1不变。不过测试在0.01时才能收敛,>=0.02不可以。

没有momentum与weight_decay。

Transform_param去掉或改为1.0不能收敛。

高斯初始化,设置不同的std。

还有一些网络输入输出的修改。

net: "examples/mnist/lenet_train_test2.prototxt"
test_iter: 100
test_interval: 500
base_lr: 0.01
momentum: 0
weight_decay: 0
lr_policy: "inv"
gamma: 0
power: 0
display: 100
max_iter: 30000
snapshot: 30000
snapshot_prefix: "examples/mnist/lenet"
solver_mode: GPU

修改后的net。

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
transform_param {
    scale: 0.00390625
  }
    data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 10
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_test_lmdb"
    batch_size: 10
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 1
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.09
    }
    bias_filler {
      type: "gaussian"
      std: 1.0
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 40
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.06
    }
    bias_filler {
      type: "gaussian"
      std: 1.0
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 100
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "gaussian"
      std: 1.0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "gaussian"
      std: 0.33
    }
    bias_filler {
      type: "gaussian"
      std: 1.0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

 

得到的结果和theano接近。

I0105 17:29:22.523669  2836 solver.cpp:317] Iteration 30000, loss = 0.00268317
I0105 17:29:22.523669  2836 solver.cpp:337] Iteration 30000, Testing net (#0)
I0105 17:29:22.648680  2836 solver.cpp:404]     Test net output #0: accuracy = 0.985
I0105 17:29:22.648680  2836 solver.cpp:404]     Test net output #1: loss = 0.0472795 (* 1 = 0.0472795 loss)