之前在测试NN中各个层的时间的时候,遇到一个非常奇怪的问题,分别使用Caffe自己的gpu方法和cuDNN方法,在卷积上性能差异非常大,但是在pooling层上基本没有变化。抽空检查了代码之后,发现是layer_factory模式导致的问题。下面就以下几个方面来进行

1.工厂模式

2.layer_factory详解

3.layer_factory中坑

4.问题影响分析

 

1.工厂模式

工厂模式是设计模式中的一种,面向的业务大概是在编码时不能预见需要创建那种类的实例,系统不依赖产品类如何被创建、组合和表达的细节,工厂模式的弊端是扩展比较少的项目中比较合适。

工厂模式有三种角色:

工厂类角色:根据逻辑产生具体的产品

抽象产品角色:具体产品的父类,一把由Java中的接口或者C++中的抽象类来实现

具体产品角色:产品实例

2.layer_factory详解

众所周知,Caffe1.0版本中,目前有三大类算子:CPU版本、Caffe自己实现的CUDA版本的和CuDNN版本的。layer_factory文件负责组装Caffe中算子,工厂模式的意思就是根据用户的设置,在执行时,选择相应版本的算子进行。

以下参考至http://zhuanlan.zhihu.com/hacker-and-painter/20456649

layer_factory.hpp是layer_factory的头文件

/**
 * @brief A layer factory that allows one to register layers.
 * During runtime, registered layers could be called by passing a LayerParameter
 * protobuffer to the CreateLayer function:
 *
 *     LayerRegistry<Dtype>::CreateLayer(param);
 *
 * There are two ways to register a layer. Assuming that we have a layer like:
 *
 *   template <typename Dtype>
 *   class MyAwesomeLayer : public Layer<Dtype> {
 *     // your implementations
 *   };
 *
 * and its type is its C++ class name, but without the "Layer" at the end
 * ("MyAwesomeLayer" -> "MyAwesome").
 *
 * If the layer is going to be created simply by its constructor, in your c++
 * file, add the following line:
 *
 *    REGISTER_LAYER_CLASS(MyAwesome);
 *
 * Or, if the layer is going to be created by another creator function, in the
 * format of:
 *
 *    template <typename Dtype>
 *    Layer<Dtype*> GetMyAwesomeLayer(const LayerParameter& param) {
 *      // your implementation
 *    }
 *
 * (for example, when your layer has multiple backends, see GetConvolutionLayer
 * for a use case), then you can register the creator function instead, like
 *
 * REGISTER_LAYER_CREATOR(MyAwesome, GetMyAwesomeLayer)
 *
 * Note that each layer type should only be registered once.
 */
 
#ifndef CAFFE_LAYER_FACTORY_H_
#define CAFFE_LAYER_FACTORY_H_
 
#include <map>
#include <string>
 
#include "caffe/common.hpp"
#include "caffe/proto/caffe.pb.h"
 
namespace caffe {
 
template <typename Dtype>
class Layer;
//LayerResistry的功能很简单,就是将类和对应的字符串类型放入到一个map当中去,以便灵活调用。主要就是注册类的功能
template <typename Dtype>
class LayerRegistry {
 public:
// 函数指针Creator,返回的是Layer<Dtype>类型的指针
  typedef shared_ptr<Layer<Dtype> > (*Creator)(const LayerParameter&);
// CreatorRegistry是字符串与对应的Creator的映射
  typedef std::map<string, Creator> CreatorRegistry;
 
  static CreatorRegistry& Registry() {
    static CreatorRegistry* g_registry_ = new CreatorRegistry();
    return *g_registry_;
  }
 
  // Adds a creator.
// 根据类型和函数指针,加入到表中
  static void AddCreator(const string& type, Creator creator) {
    CreatorRegistry& registry = Registry();
    CHECK_EQ(registry.count(type), 0)
        << "Layer type " << type << " already registered.";
    registry[type] = creator;
  }
 
  // Get a layer using a LayerParameter.
//给定层的类型,创建层
  static shared_ptr<Layer<Dtype> > CreateLayer(const LayerParameter& param) {
    LOG(INFO) << "Creating layer " << param.name();
// 从参数中获得类型字符串
    const string& type = param.type();
// 检查是否查找到给定type的Creator
    CreatorRegistry& registry = Registry();
    CHECK_EQ(registry.count(type), 1) << "Unknown layer type: " << type
        << " (known types: " << LayerTypeList() << ")";
 // 调用对应的层的Creator函数 
    return registry[type](param);
  }
 
 private:
  // Layer registry should never be instantiated - everything is done with its
  // static variables.
// 禁止实例化,因为该类都是静态函数,所以是私有的 
  LayerRegistry() {}
//返回层的类型列表
  static string LayerTypeList() {
 // 获得注册表  
    CreatorRegistry& registry = Registry();
    string layer_types;
// 遍历注册表压入layer_types字符串容器
    for (typename CreatorRegistry::iterator iter = registry.begin();
         iter != registry.end(); ++iter) {
      if (iter != registry.begin()) {
        layer_types += ", ";
      }
      layer_types += iter->first;
    }
    return layer_types;
  }
};
 
// LayerRegisterer  
// 自己定义层的注册器  
// 以供后面的宏进行使用  
template <typename Dtype>
class LayerRegisterer {
 public:
// 层的注册器的构造函数
  LayerRegisterer(const string& type,
                  shared_ptr<Layer<Dtype> > (*creator)(const LayerParameter&)) {
    // LOG(INFO) << "Registering layer type: " << type;
// 还是调用的层注册表中的加入Creator函数加入注册表 
    LayerRegistry<Dtype>::AddCreator(type, creator);
  }
};
//为了方便作者还弄了个宏便于注册自己写的层类
// 生成g_creator_f_type(type, creator<Dtype>)的两个函数 (double和float类型)
#define REGISTER_LAYER_CREATOR(type, creator)                                  \
  static LayerRegisterer<float> g_creator_f_##type(#type, creator<float>);     \
  static LayerRegisterer<double> g_creator_d_##type(#type, creator<double>)    \
/* 注册自己定义的类,类名为type,  
 假设比如type=bias,那么生成如下的代码  
 下面的函数直接调用你自己的类的构造函数生成一个类的实例并返回  
 CreatorbiasLayer(const LayerParameter& param)  
 下面的语句是为你自己的类定义了LayerRegisterer<float>类型的静态变量g_creator_f_biasLayer(float类型,实际上就是把你自己的类的字符串类型和类的实例绑定到注册表)  
 static LayerRegisterer<float> g_creator_f_biasLayer(bias, CreatorbiasLayer)  
 下面的语句为你自己的类定义了LayerRegisterer<double>类型的静态变量g_creator_d_biasLayer(double类型,实际上就是把你自己的类的字符串类型和类的实例绑定到注册表)  
 static LayerRegisterer<double> g_creator_d_biasLayer(bias, CreatorbiasLayer) 
*/
#define REGISTER_LAYER_CLASS(type)                                             \
  template <typename Dtype>                                                    \
  shared_ptr<Layer<Dtype> > Creator_##type##Layer(const LayerParameter& param) \
  {                                                                            \
    return shared_ptr<Layer<Dtype> >(new type##Layer<Dtype>(param));           \
  }                                                                            \
  REGISTER_LAYER_CREATOR(type, Creator_##type##Layer)
 
}  // namespace caffe
 
#endif  // CAFFE_LAYER_FACTORY_H_

经过上边的阐述之后,实现部分(这部分和1.0版本有出入,大的方面不影响)

layer_factory.hpp:

 

  1 // Make sure we include Python.h before any system header
  2 // to avoid _POSIX_C_SOURCE redefinition
  3 #ifdef WITH_PYTHON_LAYER
  4 #include <boost/python.hpp>
  5 #endif
  6 #include <string>
  7  
  8 #include "caffe/layer.hpp"
  9 #include "caffe/layer_factory.hpp"
 10 #include "caffe/proto/caffe.pb.h"
 11 #include "caffe/vision_layers.hpp"
 12  
 13 #ifdef WITH_PYTHON_LAYER
 14 #include "caffe/python_layer.hpp"
 15 #endif
 16  
 17 namespace caffe {
 18  
 19 // 写一个获取卷积层实例的函数
 20 // Get convolution layer according to engine.
 21 template <typename Dtype>
 22 shared_ptr<Layer<Dtype> > GetConvolutionLayer(
 23     const LayerParameter& param) {
 24    // 从参数中获取是使用什么引擎进行计算CUDNN还是CAFFE还是DEFAULT
 25    // engine可从caffe.proto中看出是枚举类型的
 26   ConvolutionParameter_Engine engine = param.convolution_param().engine();
 27   if (engine == ConvolutionParameter_Engine_DEFAULT) {
 28     engine = ConvolutionParameter_Engine_CAFFE;
 29 #ifdef USE_CUDNN
 30     engine = ConvolutionParameter_Engine_CUDNN;
 31 #endif
 32   }
 33   if (engine == ConvolutionParameter_Engine_CAFFE) {
 34     //  直接初始化Caffe的卷积层
 35     return shared_ptr<Layer<Dtype> >(new ConvolutionLayer<Dtype>(param));
 36 #ifdef USE_CUDNN
 37   } else if (engine == ConvolutionParameter_Engine_CUDNN) {
 38     // 初始化CUDNN的卷积层
 39     return shared_ptr<Layer<Dtype> >(new CuDNNConvolutionLayer<Dtype>(param));
 40 #endif
 41   } else {// 否则就是出错了
 42     LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
 43   }
 44 }
 45 // 注册该卷积层,类型名为Convolution,获取卷积层的实例为GetConvolutionLayer函数
 46 REGISTER_LAYER_CREATOR(Convolution, GetConvolutionLayer);
 47  
 48 // 获取池化层的实例,同卷积层的逻辑
 49 // Get pooling layer according to engine.
 50 template <typename Dtype>
 51 shared_ptr<Layer<Dtype> > GetPoolingLayer(const LayerParameter& param) {
 52   PoolingParameter_Engine engine = param.pooling_param().engine();
 53   if (engine == PoolingParameter_Engine_DEFAULT) {
 54     engine = PoolingParameter_Engine_CAFFE;
 55 #ifdef USE_CUDNN
 56     engine = PoolingParameter_Engine_CUDNN;
 57 #endif
 58   }
 59   if (engine == PoolingParameter_Engine_CAFFE) {
 60     return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param));
 61 #ifdef USE_CUDNN
 62   } else if (engine == PoolingParameter_Engine_CUDNN) {
 63     PoolingParameter p_param = param.pooling_param();
 64     if (p_param.pad() || p_param.pad_h() || p_param.pad_w() ||
 65         param.top_size() > 1) {
 66       LOG(INFO) << "CUDNN does not support padding or multiple tops. "
 67                 << "Using Caffe's own pooling layer.";
 68       return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param));
 69     }
 70     return shared_ptr<Layer<Dtype> >(new CuDNNPoolingLayer<Dtype>(param));
 71 #endif
 72   } else {
 73     LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
 74   }
 75 }
 76  
 77 // 注册池化层
 78 REGISTER_LAYER_CREATOR(Pooling, GetPoolingLayer);
 79  
 80 // 注册ReLU层
 81 // Get relu layer according to engine.
 82 template <typename Dtype>
 83 shared_ptr<Layer<Dtype> > GetReLULayer(const LayerParameter& param) {
 84   ReLUParameter_Engine engine = param.relu_param().engine();
 85   if (engine == ReLUParameter_Engine_DEFAULT) {
 86     engine = ReLUParameter_Engine_CAFFE;
 87 #ifdef USE_CUDNN
 88     engine = ReLUParameter_Engine_CUDNN;
 89 #endif
 90   }
 91   if (engine == ReLUParameter_Engine_CAFFE) {
 92     return shared_ptr<Layer<Dtype> >(new ReLULayer<Dtype>(param));
 93 #ifdef USE_CUDNN
 94   } else if (engine == ReLUParameter_Engine_CUDNN) {
 95     return shared_ptr<Layer<Dtype> >(new CuDNNReLULayer<Dtype>(param));
 96 #endif
 97   } else {
 98     LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
 99   }
100 }
101  
102 REGISTER_LAYER_CREATOR(ReLU, GetReLULayer);
103  
104 // 注册sigmoid层
105 // Get sigmoid layer according to engine.
106 template <typename Dtype>
107 shared_ptr<Layer<Dtype> > GetSigmoidLayer(const LayerParameter& param) {
108   SigmoidParameter_Engine engine = param.sigmoid_param().engine();
109   if (engine == SigmoidParameter_Engine_DEFAULT) {
110     engine = SigmoidParameter_Engine_CAFFE;
111 #ifdef USE_CUDNN
112     engine = SigmoidParameter_Engine_CUDNN;
113 #endif
114   }
115   if (engine == SigmoidParameter_Engine_CAFFE) {
116     return shared_ptr<Layer<Dtype> >(new SigmoidLayer<Dtype>(param));
117 #ifdef USE_CUDNN
118   } else if (engine == SigmoidParameter_Engine_CUDNN) {
119     return shared_ptr<Layer<Dtype> >(new CuDNNSigmoidLayer<Dtype>(param));
120 #endif
121   } else {
122     LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
123   }
124 }
125  
126 REGISTER_LAYER_CREATOR(Sigmoid, GetSigmoidLayer);
127  
128 // 注册softmax层
129 // Get softmax layer according to engine.
130 template <typename Dtype>
131 shared_ptr<Layer<Dtype> > GetSoftmaxLayer(const LayerParameter& param) {
132   SoftmaxParameter_Engine engine = param.softmax_param().engine();
133   if (engine == SoftmaxParameter_Engine_DEFAULT) {
134     engine = SoftmaxParameter_Engine_CAFFE;
135 #ifdef USE_CUDNN
136     engine = SoftmaxParameter_Engine_CUDNN;
137 #endif
138   }
139   if (engine == SoftmaxParameter_Engine_CAFFE) {
140     return shared_ptr<Layer<Dtype> >(new SoftmaxLayer<Dtype>(param));
141 #ifdef USE_CUDNN
142   } else if (engine == SoftmaxParameter_Engine_CUDNN) {
143     return shared_ptr<Layer<Dtype> >(new CuDNNSoftmaxLayer<Dtype>(param));
144 #endif
145   } else {
146     LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
147   }
148 }
149  
150 REGISTER_LAYER_CREATOR(Softmax, GetSoftmaxLayer);
151  
152 // 注册tanh层
153 // Get tanh layer according to engine.
154 template <typename Dtype>
155 shared_ptr<Layer<Dtype> > GetTanHLayer(const LayerParameter& param) {
156   TanHParameter_Engine engine = param.tanh_param().engine();
157   if (engine == TanHParameter_Engine_DEFAULT) {
158     engine = TanHParameter_Engine_CAFFE;
159 #ifdef USE_CUDNN
160     engine = TanHParameter_Engine_CUDNN;
161 #endif
162   }
163   if (engine == TanHParameter_Engine_CAFFE) {
164     return shared_ptr<Layer<Dtype> >(new TanHLayer<Dtype>(param));
165 #ifdef USE_CUDNN
166   } else if (engine == TanHParameter_Engine_CUDNN) {
167     return shared_ptr<Layer<Dtype> >(new CuDNNTanHLayer<Dtype>(param));
168 #endif
169   } else {
170     LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
171   }
172 }
173  
174 REGISTER_LAYER_CREATOR(TanH, GetTanHLayer);
175  
176 // 注册PYTHON层
177 #ifdef WITH_PYTHON_LAYER
178 template <typename Dtype>
179 shared_ptr<Layer<Dtype> > GetPythonLayer(const LayerParameter& param) {
180   Py_Initialize();
181   try {
182     bp::object module = bp::import(param.python_param().module().c_str());
183     bp::object layer = module.attr(param.python_param().layer().c_str())(param);
184     return bp::extract<shared_ptr<PythonLayer<Dtype> > >(layer)();
185   } catch (bp::error_already_set) {
186     PyErr_Print();
187     throw;
188   }
189 }
190  
191 REGISTER_LAYER_CREATOR(Python, GetPythonLayer);
192 #endif
193  
194 // Layers that use their constructor as their default creator should be
195 // registered in their corresponding cpp files. Do not register them here.
196 }  // namespace caffe

 

3.layer_factory中坑

在现有的代码中,Pooling层的注册部分出现了这个代码:

// CuDNN assumes layers are not being modified in place, thus
    // breaking our index tracking for updates in some cases in Caffe.
    // Until there is a workaround in Caffe (index management) or
    // cuDNN, use Caffe layer to max pooling, or don't use in place
    // layers after max pooling layers
    if (param.pooling_param().pool() == PoolingParameter_PoolMethod_MAX) {
        return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param));
    } else {
        return shared_ptr<Layer<Dtype> >(new CuDNNPoolingLayer<Dtype>(param));
    }

这就直接导致,只要你用的是MaxPool,使用的一定是Caffe自己实现的cu代码,永远无法使用cuDNN版本的代码,这就解释了我们之前测试MaxPool层性能一直没有变化的原因

4.问题影响分析

但是caffe的作者为什么不使用cuDNN的MaxPool呢,经过查询NVIDIA cuDNN的User Manual,我们发现,

4.144. cudnnPoolingForward

cudnnStatus_t cudnnPoolingForward(
    cudnnHandle_t                    handle,
    const cudnnPoolingDescriptor_t   poolingDesc,
    const void                      *alpha,
    const cudnnTensorDescriptor_t    xDesc,
    const void                      *x,
    const void                      *beta,
    const cudnnTensorDescriptor_t    yDesc,
    void                            *y)

This function computes pooling of input values (i.e., the maximum or average of several adjacent values) to produce an output with smaller height and/or width.

Note: All tensor formats are supported, best performance is expected when usingHW-packedtensors. Only 2 and 3 spatial dimensions are allowed.
Note: The dimensions of the output tensoryDesccan be smaller or bigger than the dimensions advised by the routinecudnnGetPooling2dForwardOutputDimorcudnnGetPoolingNdForwardOutputDim.

Parameters

handle

Input. Handle to a previously created cuDNN context.

poolingDesc

Input. Handle to a previously initialized pooling descriptor.

alpha, beta

Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Refer to this section for additional details.

xDesc

Input. Handle to the previously initialized input tensor descriptor. Must be of type FLOAT, or DOUBLE, or HALF, or INT8. See cudnnDataType_t.

x

Input. Data pointer to GPU memory associated with the tensor descriptorxDesc.

yDesc

Input. Handle to the previously initialized output tensor descriptor. Must be of type FLOAT, or DOUBLE, or HALF, or INT8. See cudnnDataType_t.

y

Output. Data pointer to GPU memory associated with the output tensor descriptoryDesc.

The possible error values returned by this function and their meanings are listed below.

Returns

CUDNN_STATUS_SUCCESS

The function launched successfully.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • The dimensionsn,cof the input tensor and output tensors differ.
  • Thedatatypeof the input tensor and output tensors differs.
CUDNN_STATUS_NOT_SUPPORTED

The function does not support the provided configuration. See the following for some examples of non-supported configurations:

  • ThewStrideof input tensor or output tensor is not 1.
CUDNN_STATUS_EXECUTION_FAILED

The function failed to launch on the GPU

 

这个地方比较神奇的是只能传入两个参数,这就无法实现mask的更新,不太明白cuDNN设计者的思路,目前看,这个地方要想保持正确性,暂时应该是无法使用cuDNN的PoolingForward了。