【问题标题】:Is it possible to translate this Python code to Cython?是否可以将此 Python 代码翻译成 Cython?
【发布时间】:2023-04-06 01:23:01
【问题描述】:

我实际上希望尽可能加快此代码的#2,所以我认为尝试 Cython 可能会很有用。但是,我不确定如何在 Cython 中实现稀疏矩阵。有人可以展示如何/是否可以将其包装在 Cython 或 Julia 中以使其更快?

#1) This part computes u_dict dictionary filled with unique strings and then enumerates them.

import scipy.sparse as sp
import numpy as np
from scipy.sparse import csr_matrix

full_dict = set(train1.values.ravel().tolist() + test1.values.ravel().tolist() + train2.values.ravel().tolist() + test2.values.ravel().tolist())
print len(full_dict)
u_dict= dict()
for i, q in enumerate(full_dict):
    u_dict[q] = i


shape = (len(full_dict), len(full_dict))
H = sp.lil_matrix(shape, dtype=np.int8)


def load_sparse_csr(filename):
    loader = np.load(filename)
    return csr_matrix((loader['data'], loader['indices'], loader['indptr']),
                      shape=loader['shape'])

#2) I need to speed up this part
# train_full is pandas dataframe with two columns w1 and w2 filled with strings

H = load_sparse_csr('matrix.npz')

correlation_train = []
for idx, row in train_full.iterrows():
    if idx%1000 == 0: print idx
    id_1 = u_dict[row['w1']]
    id_2 = u_dict[row['w2']]
    a_vec = H[id_1].toarray() # these vectors are of length of < 3 mil.
    b_vec = H[id_2].toarray()
    correlation_train.append(np.corrcoef(a_vec, b_vec)[0][1])

【问题讨论】:

标签:
python
pandas
numpy
scipy
cython