循环神经网络——序列模型

文章目录

循环神经网络 Recurrent Neural Networks

前向传播
代价函数
反向传播

门控循环单元 GRU (gated recurrent units)
长短时记忆单元 LSTM (long short time memory)
双向RNN (bidirectional RNN)
深层RNN

循环神经网络 Recurrent Neural Networks

前向传播

many-to-many 结构
$begin{array}{ccccccccc}& hat{y}^{<1>} && hat{y}^{<2>} && hat{y}^{<3>} & & hat{y}^{<T>} \& uparrow && uparrow && uparrow && uparrow\a^{<0>}rightarrow& boxed{begin{matrix} bigcirc \ bigcirc \ bigcirc \ bigcirc end{matrix}}& xrightarrow{a^{<1>}}& boxed{begin{matrix} bigcirc \ bigcirc \ bigcirc \ bigcirc end{matrix}}& xrightarrow{a^{<2>}}& boxed{begin{matrix} bigcirc \ bigcirc \ bigcirc \ bigcirc end{matrix}}& rightarrow cdots rightarrow& boxed{begin{matrix} bigcirc \ bigcirc \ bigcirc \ bigcirc end{matrix}}\& uparrow && uparrow && uparrow && uparrow\& x^{<1>} && x^{<2>} && x^{<3>} & & x^{<T>} \end{array}$

或者表示成：

$begin{array}{ccccccccc}& hat{y}^{<1>} && hat{y}^{<2>} & & hat{y}^{<T>} \& uparrow && uparrow & & uparrow\a^{<0>}rightarrow& boxed{a^{<1>}}& rightarrow& boxed{a^{<2>}}& rightarrow cdots rightarrow& boxed{a^{<T>}}\& uparrow && uparrow && uparrow \& x^{<1>} && x^{<2>} & & x^{<T>} \end{array}$

数学表达式：
$begin{aligned}a^{<0>} &= vec{0} \a^{<1>} &= g_{1}(W_{aa}a^{<0>} + W_{ax}x^{<1>} + b_{a}) = g_{1}(W_{a}[a^{<0>},x^{<1>}] + b_{a})\y^{<1>} &= g_{2}(W_{y}a^{<1>} + b_{y}) \vdots \a^{<T>} &= g_{1}(W_{aa}a^{<T-1>} + W_{ax}x^{<T-1>} + b_{a}) = g_{1}(W_{a}[a^{<T-1>},x^{<t>}] + b_{a})\y^{<T>} &= g_{2}(W_{y}a^{<T>} + b_{y}) \end{aligned}$ 其中，**函数 $g_{1}$ 通常取 tanh 或 relu， $g_{2}$ 取 sigmoid.

many-to-one 结构
例：输入一部影片，进行用户情感分析(喜欢/不喜欢)
$begin{array}{ccccccccc}&&&& & hat{y}^{<T>} \&&&& & uparrow\a^{<0>}rightarrow& boxed{a^{<1>}}& rightarrow& boxed{a^{<2>}}& rightarrow cdots rightarrow& boxed{a^{<T>}}\& uparrow && uparrow && uparrow \& x^{<1>} && x^{<2>} & & x^{<T>} \end{array}$

one-to-one 结构
$begin{array}{ccc}& hat{y} \& uparrow\a^{<0>}rightarrow& boxed{a^{<1>}} \& uparrow \& x \end{array}$

one-to-many 结构：如音乐生成器
$begin{array}{cccccc}& hat{y}^{<1>} && hat{y}^{<2>} & & hat{y}^{<T>} \& uparrow && uparrow & & uparrow\a^{<0>}rightarrow& boxed{a^{<1>}}& rightarrow& boxed{a^{<2>}}& rightarrow cdots rightarrow& boxed{a^{<T>}}\& uparrow\& x或phi \end{array}$

其他many-to-many结构
$begin{array}{ccccccccc}&&&&& hat{y}^{<1>} && hat{y}^{<T_{y}>} \&&&&& uparrow && uparrow\a^{<0>}rightarrow& boxed{a^{<1>}}& rightarrow cdots rightarrow& boxed{a^{<T_{x}>}}& rightarrow& boxed{a^{<T_{x}+1>}}& rightarrow cdots rightarrow& boxed{a^{<T_{x}+T_{y}>}}\& uparrow && uparrow\& x^{<1>} && x^{<T_{x}>} \end{array}$

代价函数

$L(hat{y}, y) = sum_{t=1}^{T} L^{<t>}(hat{y}^{<t>}, y^{<t>})$ 其中， $L^{<t>}(hat{y}^{<t>}, y^{<t>}) = -y^{<t>}loghat{y}^{<t>} - (1-y^{<t>})log(1-hat{y}^{<t>})$

反向传播

仅以 many-to-many 为例：
循环神经网络——序列模型

门控循环单元 GRU (gated recurrent units)

解决梯度消失问题
c：memory cell
$begin{aligned}tilde c^{<t>} &= tanh(W_{c}[Gamma_{r} times c^{<t-1>},x^{<t>}] + b_{c}) \相关门：Gamma_{r} &= sigma(W_{r}[c^{<t-1>},x^{<t>}] + b_{r}) \更新门：Gamma_{u} &= sigma(W_{u}[c^{<t-1>},x^{<t>}] + b_{u}) \c^{<t>} &= Gamma_{u}times tilde c^{<t>} + (1-Gamma_{u}) times c^{<t-1>} \a^{<t>} &= c^{<t>}\end{aligned}$ 当 $Gamma_{u} approx 1$ 时， $c^{<t>} approx c^{<t-1>}$ .

长短时记忆单元 LSTM (long short time memory)

$begin{aligned}tilde c^{<t>} &= tanh(W_{c}[a^{<t-1>},x^{<t>}] + b_{c}) \更新门：Gamma_{u} &= sigma(W_{u}[a^{<t-1>},x^{<t>}] + b_{u}) \遗忘门：Gamma_{f} &= sigma(W_{f}[a^{<t-1>},x^{<t>}] + b_{f}) \输出门：Gamma_{o} &= sigma(W_{o}[a^{<t-1>},x^{<t>}] + b_{o}) \c^{<t>} &= Gamma_{u}times tilde c^{<t>} + Gamma_{f} times c^{<t-1>} \a^{<t>} &= Gamma_{o} times c^{<t>}end{aligned}$
循环神经网络——序列模型
GRU or LSTM ?
GRU 只有两个门控，更简单，可以看成是LSTM的简化；
LSTM 有三个门控，更强大和灵活。

双向RNN (bidirectional RNN)

循环神经网络——序列模型
如对于输出 $hat{y}^{<3>}$ ，即收到了来自过去 $x^{<1>}, x^{<2>}$ 的信息，又收到了来自现在 $x^{<3>}$ ，也收到了来自未来 $x^{<4>}$ 的信息。
在处理NLP问题中，带有LSTM的双向RNN是非常常用的。

深层RNN

$begin{array}{ccccccccc}& hat{y}^{<1>} && hat{y}^{<2>} & & hat{y}^{<T>} \& uparrow && uparrow & & uparrow\a^{[3]<0>}rightarrow& boxed{a^{[3]<1>}}& rightarrow& boxed{a^{[3]<2>}}& rightarrow cdots rightarrow& boxed{a^{[3]<T>}}\& uparrow && uparrow & & uparrow\a^{[2]<0>}rightarrow& boxed{a^{[2]<1>}}& rightarrow& boxed{a^{[2]<2>}}& rightarrow cdots rightarrow& boxed{a^{[2]<T>}}\& uparrow && uparrow & & uparrow\a^{[1]<0>}rightarrow& boxed{a^{[1]<1>}}& rightarrow& boxed{a^{[1]<2>}}& rightarrow cdots rightarrow& boxed{a^{[1]<T>}}\& uparrow && uparrow && uparrow \& x^{<1>} && x^{<2>} & & x^{<T>} \end{array}$