1.AFM模型pytorch实现。

AFM模型 pytorch示例代码

$hat{y}_{AFM}=w_{0} + sum_{i=1}^{n}w_{i}x_{i}+p^{T}sum_{i=1}^{n-1}sum_{j=i+1}^{n}a_{ij}(v_{i}v_{j})x_{i}x_{j}$

$a_{ij}^{'}=h^{T}Relu(W(v_{i}v_{j})x_{i}x_{j}+b)$

$a_{ij}=frac{exp(a_{ij}^{'})}{sum_{i,j}exp(a_{ij}^{'})}$

(实际数据使用的是Dataloader,需要设置batch_size等参数。)

设原来的数据有num_fields =3个特征,one-hot编码过后对应有30维度,嵌入维度设为ebd_size=4。所以嵌入层定义为

ebd_size = 4
ebd = nn.Embedding(30,ebd_size)

自定义一个batch_size的数据

x_ = [[1, 13, 22], [0, 18,29],[2, 13,27], [0, 11,22],[1, 14,26]]  #shape=batch_size*num_fields 
x_ =  Variable(torch.LongTensor([[1, 13, 22], [0, 18,29],[2, 13,27], [0, 11,22],[1, 14,26]]))

得到对应的嵌入向量

x=ebd(x_)

计算交叉特征:

$(v_{i}v_{j})x_{i}x_{j}$

交叉特征数目=num_fields*(num_fields - 1)/2  

inner_product的shape为batch_size*交叉特征数目*嵌入维度

num_fields = x.shape[1]
row, col = list(), list()
for i in range(num_fields - 1):
    for j in range(i + 1, num_fields):
        row.append(i), col.append(j)
p, q = x[:, row], x[:, col]
inner_product = p * q

接下来求得 

$Relu(W(v_{i}v_{j})x_{i}x_{j}+b)$

用一个nn.Linear层,在经过一个Relu激活函数可以完成

attention(inner_product))结果的shape为 batch_size*交叉特征*嵌入维度
attention = torch.nn.Linear(ebd_size, ebd_size)
print(attention(inner_product))  # batch_size*交叉特征*嵌入维度
attn_scores = F.relu(attention(inner_product))
print("attn_scores", attn_scores)  # batch_size*交叉特征*嵌入维度

接下来在经过一个linear得到$a_{ij}^{'}$

$a_{ij}^{'}=h^{T}Relu(W(v_{i}v_{j})x_{i}x_{j}+b)$

projection = torch.nn.Linear(ebd_size, 1)
print("projection(attn_scores)", projection(attn_scores))  # batch_size*交叉特征*1

在经过一个softmax得到

$a_{ij}=frac{exp(a_{ij}^{'})}{sum_{i,j}exp(a_{ij}^{'})}$

attn_scores = F.softmax(projection(attn_scores), dim=1)
print("attn_scores", attn_scores)  # batch_size*交叉特征*1

接下来把交叉特征$(v_{i}v_{j})x_{i}x_{j}$与注意力权重$a_{ij}$相乘

print("attn_scores * inner_product", attn_scores * inner_product)  # batch_size*交叉特征*嵌入维度
attn_output = torch.sum(attn_scores * inner_product, dim=1)
print("attn_output", attn_output)  # batch_size*嵌入维度

最后经过一个输出大小为1的全连接层

fc = torch.nn.Linear(ebd_size, 1)
fc_out = fc(attn_output)
print("fc_out", fc_out)  # batch_size*1

这样就把$p^{T}sum_{i=1}^{n-1}sum_{j=i+1}^{n}a_{ij}(v_{i}v_{j})x_{i}x_{j}$求出来了,前面一阶部分使用一个Linear层就可以求得到

参考代码:

import torch
import numpy as np
from torch.autograd import Variable
import torch.nn.functional as F
import torch.nn as nn
ebd_size = 4
ebd = nn.Embedding(30,ebd_size)
x_ =  Variable(torch.LongTensor([[1, 13, 22], [0, 18,29],[2, 13,27], [0, 11,22],[1, 14,26]]))
x=ebd(x_)
num_fields = x.shape[1]
row, col = list(), list()
for i in range(num_fields - 1):
    for j in range(i + 1, num_fields):
        row.append(i), col.append(j)
p, q = x[:, row], x[:, col]
inner_product = p * q
print("inner_product", inner_product)  # batch_size*交叉特征*嵌入维度
attention = torch.nn.Linear(ebd_size, ebd_size)
print(attention(inner_product))  # batch_size*交叉特征*嵌入维度
attn_scores = F.relu(attention(inner_product))
print("attn_scores", attn_scores)  # batch_size*交叉特征*嵌入维度
projection = torch.nn.Linear(ebd_size, 1)
print("projection(attn_scores)", projection(attn_scores))  # batch_size*交叉特征*1
attn_scores = F.softmax(projection(attn_scores), dim=1)
print("attn_scores", attn_scores)  # batch_size*交叉特征*1
print("attn_scores * inner_product", attn_scores * inner_product)  # batch_size*交叉特征*嵌入维度
attn_output = torch.sum(attn_scores * inner_product, dim=1)
print("attn_output", attn_output)  # batch_size*嵌入维度
fc = torch.nn.Linear(ebd_size, 1)
fc_out = fc(attn_output)
print("fc_out", fc_out)  # batch_size*1
exit()