caffe中activation function的形式,直接决定了其训练速度以及SGD的求解。
在caffe中,不同的activation function对应的sgd的方式是不同的,因此,在配置文件中指定activation layer的type,目前caffe中用的最多的是relu的activation function.
caffe中,目前实现的activation function有以下几种:
absval, bnll, power, relu, sigmoid, tanh等几种,分别有单独的layer层。其数学公式分别为:
算了,这部分我不解释了,直接看caffe的tutorial吧
ReLU / Rectified-Linear and Leaky-ReLU
- LayerType:
RELU
- CPU implementation:
./src/caffe/layers/relu_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/relu_layer.cu
- Parameters (
ReLUParameter relu_param
)- Optional
-
negative_slope
[default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.
-
- Optional
-
Sample (as seen in
./examples/imagenet/imagenet_train_val.prototxt
)layers { name: "relu1" type: RELU bottom: "conv1" top: "conv1" }
Given an input value x, The RELU
layer computes the output as x if x > 0 and negative_slope * x if x <= 0. When the negative slope parameter is not set, it is equivalent to the standard ReLU function of taking max(x, 0). It also supports in-place computation, meaning that the bottom and the top blob could be the same to preserve memory consumption.
Sigmoid
- LayerType:
SIGMOID
- CPU implementation:
./src/caffe/layers/sigmoid_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/sigmoid_layer.cu
-
Sample (as seen in
./examples/imagenet/mnist_autoencoder.prototxt
)layers { name: "encode1neuron" bottom: "encode1" top: "encode1neuron" type: SIGMOID }
The SIGMOID
layer computes the output as sigmoid(x) for each input element x.
TanH / Hyperbolic Tangent
- LayerType:
TANH
- CPU implementation:
./src/caffe/layers/tanh_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/tanh_layer.cu
-
Sample
layers { name: "layer" bottom: "in" top: "out" type: TANH }
The TANH
layer computes the output as tanh(x) for each input element x.
Absolute Value
- LayerType:
ABSVAL
- CPU implementation:
./src/caffe/layers/absval_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/absval_layer.cu
-
Sample
layers { name: "layer" bottom: "in" top: "out" type: ABSVAL }
The ABSVAL
layer computes the output as abs(x) for each input element x.
Power
- LayerType:
POWER
- CPU implementation:
./src/caffe/layers/power_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/power_layer.cu
- Parameters (
PowerParameter power_param
)- Optional
-
power
[default 1] -
scale
[default 1] -
shift
[default 0]
-
- Optional
-
Sample
layers { name: "layer" bottom: "in" top: "out" type: POWER power_param { power: 1 scale: 1 shift: 0 } }
The POWER
layer computes the output as (shift + scale * x) ^ power for each input element x.
BNLL
- LayerType:
BNLL
- CPU implementation:
./src/caffe/layers/bnll_layer.cpp
- CUDA GPU implementation:
./src/caffe/layers/bnll_layer.cu
-
Sample
layers { name: "layer" bottom: "in" top: "out" type: BNLL }
The BNLL
(binomial normal log likelihood) layer computes the output as log(1 + exp(x)) for each input element x.
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:caffe中的sgd,与激活函数(activation function) - Python技术站