官方链接:https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
1,新建conda环境(可选)
conda create -n geo1 python=3.6
conda activate geo1
conda deactivate
nvcc --version
10.0
nvcc 版本为10.0 ,所有想找torch 100的,但Geometric似乎只支持torch1.4和torch1.5。
2, 安装pytorch和对应驱动
pytorch官网
pip install torch==1.4.0+cu100-f https://download.pytorch.org/whl/torch_stable.html
注意,这个cu100非常重要,一不小心就白下载并且安装不了。
完成后查看版本
python -c "import torch; print(torch.version.cuda)"
python -c "import torch; print(torch.__version__)"
torch.version.cuda 10.0
torch.version 1.4.0+cu100
到这里,已经完成最重要的一步了,就是nvcc --version
和torch.version.cuda
的版本均为10.0,这里要相同。后面的问题记录都是因为这两者不同引起的。
3,清空cache(最好这样)
因为Stored in directory: /root/.cache/pip,所以
rm -rf ~/.cache/pip
网上还有一种方法,
pip install --verbose --no-cache-dir torch-scatter torch-sparse torch-cluster torch-spline-conv
要花超级久的时间,在windows下安装成功了,linux下还是不行。
4,安装相关库
pip install torch-scatter==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-sparse==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-cluster==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-spline-conv==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-geometric
这里也要注意cu100
,和后面的torch-1.4.0.html
,对应我们的pytorch版本。
5,测试
python -c "import torch_geometric"
如果没报错,就成功了。
后面是碰到相关问题,查的一些资料,可以不看。
我的环境:
nvcc --version 10.0
torch.version.cuda 10.1
torch.version 1.4.0
这就很头疼,最好的办法是让两个cuda版本相同。
我碰到的问题:
安装 torch-geometric 后,调用时会报如下错。
OSError: libcusparse.so.10: cannot open shared object file: No such file or directory
查了一下资料。
官方推荐 set LD_LIBRARY_PATH for cuda
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
$ echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64:…
尝试了:
上面代码和
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH
echo $LD_LIBRARY_PATH
无果。
很多问题都是和tf相关的,很少torch-geometric的,只能看一看了。
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
他有cuda9,安装tensorflow是报错没有 libcublas.so.9.0
I think this is due to the fact that you have CUDA 9.1 and not 9.0, I am facing exactly the same issue.
你有cuda9.1,但这个版本的tf需要cuda9.0
I think you should use symbol link from ‘‘cuda/’’ to ''cuda/9.1",or your cuda version is too new to tensorflow master branch
试试软连接,你的cuda太新了
Have you solved it ? This problem is caused tensorflow-gpu-1.5 required cuda 9.0 ,so you should install tensorflow-gpu-1.4. And rember uninstall tensorflow-gpu-1.5. Please use this"pip install --upgrade tensorflow-gpu==1.4"
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
Latest TensorFlow supports cuda 8-10. cudnn 6-7. ( 28 Feb 2019)
Each TensorFlow binary has to work with the version of cuda and cudnn it was built with. If they don’t match, you have to change either the TensorFlow binary or the Nvidia softwares.
每一个版本的tf都必须和对应的cuda版本匹配才能工作。你要么改tf,要么改nvidia
I just found this out myself, not sure if it’s common knowledge, but got around this by doing
conda install cudatoolkit
conda install cudnn
I have cuda-10.1 installed on my box, this installed a local conda-only cuda-10.0. Obviously this is to just keep tensorflow working while waiting for better support.
Another solution: Don’t install anything from conda, just install from pip
Steps:
Create a fresh environment
pip install tensorflow == 1.12.0
pip install tensorflow-gpu == 1.12.0
pip install keras == 2.1.3
If you have anything that you want to install from conda, check if it is available on the pip version. If it is not then,
Let’s say that your env name is my_env_1
after activating that environment, type which conda,
if this gives the path to your created environment (…\my_env_1…), then you can install other essential environments. If this gives (…), then type pip install conda, then install other essential environments. (be sure to check again by typing which conda)
解决tensorflow-gpu2.0与CUDA10.1的兼容问题 dlerror: libcudart.so.10.0: cannot open shared object file
https://github.com/tensorflow/tensorflow/issues/26289
sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcublas.so.10.1.0.105 /usr/lib64/libcublas.so.10.0
sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcusolver.so.10.1.105 /usr/lib64/libcusolver.so.10.0
sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudart.so.10.1.105 /usr/lib64/libcudart.so.10.0
It seems mainly because the SONAME is incompatible for some library in cuda
I just modify the third_party/gpus/cuda_configure.bzl at line 871, 878, 885, 892, 907
replace cuda_config.cuda_version, with “10”,
then using symlinks it works, I hope it will help you
I would also like to point out (in addition to everything that is already said) that PyTorch officially only displays CUDA 10.1 support. For someone like me who often end up having TF and PyTorch in same environment, this makes it very difficult to have them install side by side. Fortunately PyTorch builds have been superbly flexible with CUDA and I eventually figured out that merely including cudatoolkit=10.0 in conda install makes it work with CUDA 10.0 as well! I hope something like this might be possible for TF.
In addition to what @sytelus said, the exact statement is:
conda install pytorch torchvision cudatoolkit=10.0 python==X -c pytorch
where X is your current python version if you don’t want to upgrade it.
ImportError: libcusparse.so.10.0: cannot open shared object file: No such file or directory
CUDA: 10.0
PyTorch: 1.1.0
安装torch-geometric 报错 libcusparse.so.10.0
echo $PATH
/ENV/anaconda3/envs/TORCH1.1/bin:/usr/local/cuda/bin:/usr/local/cuda/bin:/ENV/anaconda3/bin:/home/titian/bin:/home/titian/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
echo $CPATH
/usr/local/cuda/include:/usr/local/cuda/include:
It caused by my conda virtual env .
I used /usr/lobal/cuda to compiler the torch-sparse .
However , my codes depend on the virtual env when it runs . . .
So I just moved the libcusparse.so.10.0 files from /usr/lobal/cuda/lib64 to my virtual env , then it was solved!
使用了/usr/lobal/cuda来编译the torch-sparse,但代码依赖于虚拟环境env 。通过将
libcusparse.so.10.0 文件从 /usr/lobal/cuda/lib64 拷贝到 虚拟环境env中,解决。
【CUDA】更改cuda版本后nvcc -V依然显示更改前的版本解决办法
【结语】
简直要命,环境调起来真费劲,花了2天时间,最主要是在一种没有希望的状态下进行,都快哭了。期间感受只有自己经历过才知道啊。