经典网络LeNet-5介绍及代码测试(Caffe, MNIST, C++)

LeNet-5：包含7个层(layer)，如下图所示：输入层没有计算在内，输入图像大小为32*32*1，是针对灰度图进行训练和预测的。论文名字为” Gradient-Based Learning Applied to Document Recognition”，可以直接从http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf 下载原始论文。

经典网络LeNet-5介绍及代码测试(Caffe, MNIST, C++)

第一层是卷积层，使用6个5*5的filter，stride为1，padding为0，输出结果为28*28*6，6个feature maps，训练参数(5*5*1)*6+6=156(weights + bias)；

第二层进行平均池化操作，filter为2*2，stride为2，padding为0，输出结果为14*14*6，6个feature maps，训练参数1*6+6=12(coefficient + bias)；

第三层是卷积层，使用16个5*5的filter，stride为1，padding为0，输出结果为10*10*16，16个feature maps，训练参数(按照论文给的连接方式) (5*5*3+1)*6 + (5*5*4+1)*6+(5*5*4+1)*3+(5*5*6+1)*1 = 1516(weights + bias)；

第四层又是平均池化层，filter为2*2，stride为2，padding为0，输出结果为5*5*16，16个feature maps，训练参数1*16+16=32(coefficient + bias)；

第五层是卷积层，使用120个5*5的fiter,stride为1，输出结果为1*1*120，120个feature maps，训练参数(5*5*6)*120+120=48120(weights + bias)；

第六层是一个全连接层，有84个神经元，训练参数120*84+84=10164(weights + bias)；此layer使用的**函数为tanh。

第七层得到最后的输出预测y’的值，y’有10个可能的值，对应识别0--9这10个数字，在现在的版本中，使用softmax函数输出10种分类结果，即第七层为softmax。

输入图像size为32*32比训练数据集图像size为28*28大的原因：期望诸如笔划终点(stroke end-points)或角点(corner)这些潜在的特征(potential distinctive feature)能够出现在最高层特征检测器(highest-level feature detectors)的感受野的中心。

没有把layer2中的每个feature map连接到layer3中的每个feature map原因：(1). 不完全的连接机制将连接的数量保持在合理的范围内；(2). 更重要的，它强制破坏了网络的对称性。因为不同的feature maps来自不同的输入，所以不同的feature maps被强制提取不同的features.(The main reason is to break the symmetry in the network and keeps the number of connections within reasonable bounds.)

LeNet-5中的5是指5个隐藏层即卷积层、池化层、卷积层、池化层、卷积层。

不同的filter可以提取不同的特征，如边沿、线性、角等特征。

关于卷积神经网络的基础介绍可参考之前的blog：

https://blog.csdn.net/fengbingchun/article/details/50529500

https://blog.csdn.net/fengbingchun/article/details/80262495

https://blog.csdn.net/fengbingchun/article/details/68065338

https://blog.csdn.net/fengbingchun/article/details/69001433

以下是参考Caffe中的测试代码对LeNet-5网络进行测试的代码，与论文中的不同处包括：

(1). 论文中要求输入层图像大小为32*32，这里为28*28；

(2). 论文中第一层卷积层输出是6个feature maps，这里是20个feature maps；

(3). 论文中池化层取均值，而这里取最大值；

(4). 论文中第三层卷积层输出是16个feature maps，这里是50个feature maps，而且这里第二层的feature map是连接到第三层的每个feature map的；

(5). 论文中第五层是卷积层，这里是全连接层并输出500个神经元，**函数采用ReLU；

(6). 论文中第七层是RBF(Euclidean Radial Basic Function)，这里采用Softmax。

以下是测试代码(lenet-5.cpp)：

#include "funset.hpp"
#include "common.hpp"

int lenet_5_mnist_train()
{	
#ifdef CPU_ONLY
	caffe::Caffe::set_mode(caffe::Caffe::CPU);
#else
	caffe::Caffe::set_mode(caffe::Caffe::GPU);
#endif

#ifdef _MSC_VER
	const std::string filename{ "E:/GitCode/Caffe_Test/test_data/Net/lenet-5_mnist_windows_solver.prototxt" };
#else
	const std::string filename{ "test_data/Net/lenet-5_mnist_linux_solver.prototxt" };
#endif
	caffe::SolverParameter solver_param;
	if (!caffe::ReadProtoFromTextFile(filename.c_str(), &solver_param)) {
		fprintf(stderr, "parse solver.prototxt fail\n");
		return -1;
	}

	mnist_convert(); // convert MNIST to LMDB

	caffe::SGDSolver<float> solver(solver_param);
	solver.Solve();

	fprintf(stdout, "train finish\n");

	return 0;
}

int lenet_5_mnist_test()
{
#ifdef CPU_ONLY
	caffe::Caffe::set_mode(caffe::Caffe::CPU);
#else
	caffe::Caffe::set_mode(caffe::Caffe::GPU);
#endif

#ifdef _MSC_VER
	const std::string param_file{ "E:/GitCode/Caffe_Test/test_data/Net/lenet-5_mnist_windows_test.prototxt" };
	const std::string trained_filename{ "E:/GitCode/Caffe_Test/test_data/Net/lenet-5_mnist_iter_10000.caffemodel" };
	const std::string image_path{ "E:/GitCode/Caffe_Test/test_data/images/handwritten_digits/" };
#else
	const std::string param_file{ "test_data/Net/lenet-5_mnist_linux_test.prototxt" };
	const std::string trained_filename{ "test_data/Net/lenet-5_mnist_iter_10000.caffemodel" };
	const std::string image_path{ "test_data/images/handwritten_digits/" };
#endif

	caffe::Net<float> caffe_net(param_file, caffe::TEST);
	caffe_net.CopyTrainedLayersFrom(trained_filename);

	const boost::shared_ptr<caffe::Blob<float> > blob_data_layer = caffe_net.blob_by_name("data");
	int image_channel_data_layer = blob_data_layer->channels();
	int image_height_data_layer = blob_data_layer->height();
	int image_width_data_layer = blob_data_layer->width();

	const std::vector<caffe::Blob<float>*> output_blobs = caffe_net.output_blobs();
	int require_blob_index{ -1 };
	const int digit_category_num{ 10 };
	for (int i = 0; i < output_blobs.size(); ++i) {
		if (output_blobs[i]->count() == digit_category_num)
			require_blob_index = i;
	}
	if (require_blob_index == -1) {
		fprintf(stderr, "ouput blob don't match\n");
		return -1;
	}

	std::vector<int> target{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
	std::vector<int> result;

	for (auto num : target) {
		std::string str = std::to_string(num);
		str += ".png";
		str = image_path + str;

		cv::Mat mat = cv::imread(str.c_str(), 1);
		if (!mat.data) {
			fprintf(stderr, "load image error: %s\n", str.c_str());
			return -1;
		}

		if (image_channel_data_layer == 1)
			cv::cvtColor(mat, mat, CV_BGR2GRAY);
		else if (image_channel_data_layer == 4)
			cv::cvtColor(mat, mat, CV_BGR2BGRA);

		cv::resize(mat, mat, cv::Size(image_width_data_layer, image_height_data_layer));
		cv::bitwise_not(mat, mat);

		boost::shared_ptr<caffe::MemoryDataLayer<float> > memory_data_layer =
			boost::static_pointer_cast<caffe::MemoryDataLayer<float>>(caffe_net.layer_by_name("data"));
		mat.convertTo(mat, CV_32FC1, 0.00390625);
		float dummy_label[1] {0};
		memory_data_layer->Reset((float*)(mat.data), dummy_label, 1);

		float loss{ 0.0 };
		const std::vector<caffe::Blob<float>*>& results = caffe_net.ForwardPrefilled(&loss);
		const float* output = results[require_blob_index]->cpu_data();

		float tmp{ -1 };
		int pos{ -1 };

		for (int j = 0; j < 10; j++) {
			//fprintf(stdout, "Probability to be Number %d is: %.3f\n", j, output[j]);
			if (tmp < output[j]) {
				pos = j;
				tmp = output[j];
			}
		}

		result.push_back(pos);
	}

	for (auto i = 0; i < 10; i++)
		fprintf(stdout, "actual digit is: %d, result digit is: %d\n", target[i], result[i]);

	fprintf(stdout, "predict finish\n");
	return 0;
}

solver.prototxt文件内容如下：

# solver.prototxt是一个配置文件用来告知Caffe怎样对网络进行训练
# 其文件内的各字段名需要在caffe.proto的message SolverParameter中存在，否则会解析不成功

net: "test_data/Net/lenet-5_mnist_linux_train.prototxt" # 训练网络文件名
test_iter: 100 # test_iter * test_batch_size = 测试图像总数量
test_interval: 500 # 指定执行多少次训练网络执行一次测试网络
base_lr: 0.01 # 学习率
lr_policy: "inv" # 学习策略, return base_lr * (1 + gamma * iter) ^ (- power)
momentum: 0.9 # 动量
weight_decay: 0.0005 # 权值衰减
gamma: 0.0001 # 学习率计算参数
power: 0.75 # 学习率计算参数
display: 100 # 指定训练多少次在屏幕上显示一次结果信息，如loss值等
max_iter: 10000 # 最多训练次数
snapshot: 5000 # 执行多少次训练保存一次中间结果
snapshot_prefix: "test_data/Net/lenet-5_mnist" # 结果保存位置前缀
solver_type: SGD # 随机梯度下降

train时的prototxt文件内容如下：

name: "LeNet-5"

layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "test_data/MNIST/train"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "test_data/MNIST/test"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

train.prototxt可视化结果如下：

经典网络LeNet-5介绍及代码测试(Caffe, MNIST, C++)

test时的test.prototxt文件内容如下：

name: "LeNet-5"

layer {
  name: "data"
  type: "MemoryData"
  top: "data" #
  top: "label"
  memory_data_param {
    batch_size: 1
    channels: 1
    height: 28
    width: 28
  }
  transform_param {
    scale: 0.00390625
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "ip2"
  top: "prob"
}

test.prototxt可视化结果如下：

经典网络LeNet-5介绍及代码测试(Caffe, MNIST, C++)