参考伯禹学习平台《动手学深度学习》课程内容内容撰写的学习笔记 原文链接:https://www.boyuai.com/elites/course/cZu18YmweLv10OeV/video/qC-4p–OiYRK9l3eHKAju 感谢伯禹平台,Datawhale,和鲸,AWS给我们提供的免费学习机会!! 总的学习感受:伯禹的课程做的很好,课程非常系统,每个较高级别的课程都会有需要掌握的前续基础知识的介绍,因此很适合本人这种基础较差的同学学习,建议基础较差的同学可以关注伯禹的其他课程: 数学基础:https://www.boyuai.com/elites/course/D91JM0bv72Zop1D3 机器学习基础:https://www.boyuai.com/elites/course/5ICEBwpbHVwwnK3C
GRU
RNN存在的问题:梯度较容易出现衰减或爆炸(BPTT) ⻔控循环神经⽹络:捕捉时间序列中时间步距离较⼤的依赖关系RNN :
H t = ϕ ( X t W x h + H t − 1 W h h + b h ) H_{t} = ϕ(X_{t}W_{xh} + H_{t-1}W_{hh} + b_{h}) H t = ϕ ( X t W x h + H t − 1 W h h + b h ) GRU :
R t = σ ( X t W x r + H t − 1 W h r + b r ) Z t = σ ( X t W x z + H t − 1 W h z + b z ) H ~ t = t a n h ( X t W x h + ( R t ⊙ H t − 1 ) W h h + b h ) H t = Z t ⊙ H t − 1 + ( 1 − Z t ) ⊙ H ~ t R_{t} = σ(X_tW_{xr} + H_{t−1}W_{hr} + b_r)\Z_{t} = σ(X_tW_{xz} + H_{t−1}W_{hz} + b_z)\widetilde{H}_t = tanh(X_tW_{xh} + (R_t ⊙H_{t−1})W_{hh} + b_h)\H_t = Z_t⊙H_{t−1} + (1−Z_t)⊙widetilde{H}_t R t = σ ( X t W x r + H t − 1 W h r + b r ) Z t = σ ( X t W x z + H t − 1 W h z + b z ) H t = t a n h ( X t W x h + ( R t ⊙ H t − 1 ) W h h + b h ) H t = Z t ⊙ H t − 1 + ( 1 − Z t ) ⊙ H t • 重置⻔有助于捕捉时间序列⾥短期的依赖关系; (大小都是h) •** 更新⻔有助于捕捉时间序列⾥⻓期的依赖关系。**
LSTM
** 长短期记忆long short-term memory **: 遗忘门:控制上一时间步的记忆细胞 输入门:控制当前时间步的输入 输出门:控制从记忆细胞到隐藏状态 记忆细胞:⼀种特殊的隐藏状态的信息的流动
I t = σ ( X t W x i + H t − 1 W h i + b i ) F t = σ ( X t W x f + H t − 1 W h f + b f ) O t = σ ( X t W x o + H t − 1 W h o + b o ) C ~ t = t a n h ( X t W x c + H t − 1 W h c + b c ) C t = F t ⊙ C t − 1 + I t ⊙ C ~ t H t = O t ⊙ t a n h ( C t ) I_t = σ(X_tW_{xi} + H_{t−1}W_{hi} + b_i) \F_t = σ(X_tW_{xf} + H_{t−1}W_{hf} + b_f)\O_t = σ(X_tW_{xo} + H_{t−1}W_{ho} + b_o)\widetilde{C}_t = tanh(X_tW_{xc} + H_{t−1}W_{hc} + b_c)\C_t = F_t ⊙C_{t−1} + I_t ⊙widetilde{C}_t\H_t = O_t⊙tanh(C_t) I t = σ ( X t W x i + H t − 1 W h i + b i ) F t = σ ( X t W x f + H t − 1 W h f + b f ) O t = σ ( X t W x o + H t − 1 W h o + b o ) C t = t a n h ( X t W x c + H t − 1 W h c + b c ) C t = F t ⊙ C t − 1 + I t ⊙ C t H t = O t ⊙ t a n h ( C t )
H t ( 1 ) = ϕ ( X t W x h ( 1 ) + H t − 1 ( 1 ) W h h ( 1 ) + b h ( 1 ) ) H t ( ℓ ) = ϕ ( H t ( ℓ − 1 ) W x h ( ℓ ) + H t − 1 ( ℓ ) W h h ( ℓ ) + b h ( ℓ ) ) O t = H t ( L ) W h q + b q boldsymbol{H}_t^{(1)} = phi(boldsymbol{X}_t boldsymbol{W}_{xh}^{(1)} + boldsymbol{H}_{t-1}^{(1)} boldsymbol{W}_{hh}^{(1)} + boldsymbol{b}_h^{(1)})\boldsymbol{H}_t^{(ell)} = phi(boldsymbol{H}_t^{(ell-1)} boldsymbol{W}_{xh}^{(ell)} + boldsymbol{H}_{t-1}^{(ell)} boldsymbol{W}_{hh}^{(ell)} + boldsymbol{b}_h^{(ell)})\boldsymbol{O}_t = boldsymbol{H}_t^{(L)} boldsymbol{W}_{hq} + boldsymbol{b}_q H t ( 1 ) = ϕ ( X t W x h ( 1 ) + H t − 1 ( 1 ) W h h ( 1 ) + b h ( 1 ) ) H t ( ℓ ) = ϕ ( H t ( ℓ − 1 ) W x h ( ℓ ) + H t − 1 ( ℓ ) W h h ( ℓ ) + b h ( ℓ ) ) O t = H t ( L ) W h q + b q
双向循环神经网络
H → t = ϕ ( X t W x h ( f ) + H → t − 1 W h h ( f ) + b h ( f ) ) H ← t = ϕ ( X t W x h ( b ) + H ← t + 1 W h h ( b ) + b h ( b ) ) begin{aligned} overrightarrow{boldsymbol{H}}_t &= phi(boldsymbol{X}_t boldsymbol{W}_{xh}^{(f)} + overrightarrow{boldsymbol{H}}_{t-1} boldsymbol{W}_{hh}^{(f)} + boldsymbol{b}_h^{(f)})\overleftarrow{boldsymbol{H}}_t &= phi(boldsymbol{X}_t boldsymbol{W}_{xh}^{(b)} + overleftarrow{boldsymbol{H}}_{t+1} boldsymbol{W}_{hh}^{(b)} + boldsymbol{b}_h^{(b)}) end{aligned} H t H t = ϕ ( X t W x h ( f ) + H t − 1 W h h ( f ) + b h ( f ) ) = ϕ ( X t W x h ( b ) + H t + 1 W h h ( b ) + b h ( b ) ) H t = ( H → t , H ← t ) boldsymbol{H}_t=(overrightarrow{boldsymbol{H}}_{t}, overleftarrow{boldsymbol{H}}_t) H t = ( H t , H t ) O t = H t W h q + b q boldsymbol{O}_t = boldsymbol{H}_t boldsymbol{W}_{hq} + boldsymbol{b}_q O t = H t W h q + b q
可以通过前后的词来估计当前的词,更加准确。