在计算集群提交任务时使用到了GPU,提示如下错误:

The NVIDIA driver on your system is too old (found version 9000).
Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

意思是说pytorch版本和cuda的驱动未对应上,解决方案一是升级驱动,二是更换pytorch版本,可行的只能是二了。

查询cuda版本

cat /usr/local/cuda/version.txt

输出结果如下:

CUDA Version 9.1.85
CUDA Patch Version 9.1.85.1
CUDA Patch Version 9.1.85.2
CUDA Patch Version 9.1.85.3

安装pytorch

https://pytorch.org/get-started/previous-versions/找到与cuda 9.1对应的pytorch号。虽然没有9.1版本的,但是9.0的也是兼容的。找到如下结果,为了方便把其他版本的也一并粘贴过来了。

# CUDA 9.0
conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch

# CUDA 10.0
conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch

# CPU Only
conda install pytorch-cpu==1.1.0 torchvision-cpu==0.3.0 cpuonly -c pytorch

那么,我们需要下载三个指定的包,分别是:pytorch==1.1.0, torchvision==0.3.0cudatoolkit=9.0。去anaconda仓库分别找到这三个东西,下载好,然后分别安装即可。

conda install cudatoolkit-9.0-h13b8566_0.tar.bz2 
conda install pytorch-1.1.0-py3.7_cuda9.0.176_cudnn7.5.1_0.tar.bz2 
conda install torchvision-0.3.0-py37_cu9.0.176_1.tar.bz2

测试

import torch
torch.__version__

# '1.1.0'