1.2019-5-57:安装好anaconda3找不到conda命令:
echo 'export PATH="/code/anaconda3/bin: P A T H " ′ > > / . b a s h r c s o u r c e / . b a s h r c c o n d a – v 2.2019 − 5 − 31 : I m p o r t E r r o r : N o m o d u l e n a m e d ′ n e t s ′ v i m / . b a s h r c e x p o r t P Y T H O N P A T H = PATH"' >> ~/.bashrc source ~/.bashrc conda –v 2.2019-5-31: ImportError: No module named 'nets' vim ~/.bashrc export PYTHONPATH= PATH"′>> /.bashrcsource /.bashrcconda–v2.2019−5−31:ImportError:Nomodulenamed′nets′vim /.bashrcexportPYTHONPATH=PYTHONPATH:/code/anaconda3/lib/python3.5/site-packages/tensorflow/models/research:/code/anaconda3/lib/python3.5/site-packages/tensorflow/models/research/slim
source ~/.bashrc
3.自己下载数据集:
注释相关操作,直接bash
4.出现找不到什么文件 :直接把文件路径设置为绝对路径就行了,应该能够解决大部分的问题
5.jupyter 密码被莫名其妙修改: jupyter notebook passwd,强行覆盖修改配置文件中的密码
6. ImportError: libSM.so.6: cannot open shared object file: No such file or directory
ImportError: libXrender.so.1: cannot open shared object file: No such file or directory
ImportError: libXext.so.6: cannot open shared object file: No such file or directory
解决:apt-get install libsm6
apt-get install libxrender1
apt-get install libxext-dev
1.安装h5py :https://stackoverflow.com/questions/29831052/error-importing-h5py
sudo pip install cython
sudo apt-get install libhdf5-dev
sudo pip install h5py
2.ImportError: No module named pywt
pip install PyWavelets
3.服务器的GUI界面远程打开
Xming + putty
4.from pip import main ImportError: cannot import name ‘main’
vi /usr/bin/pip3
from pip import main //这行也要修改
if name == 'main’:
sys.exit(main.main())//增加__main_._
5.安装提速
python -m pip install torch==0.4.0 torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple
6.重启服务 docker
Sudo systemctl daemon-reload
Sudo systemctl restart docker
7.docker: Error response from daemon: create nvidia_driver_430.14: error looking up volume plugin nvidia-docker: plugin “nvidia-docker” not found.
解决 sudo service nvidia-docker start
8.查看磁盘信息
sudo hdparm -I /dev/sda
9.容器网络端口暴露:nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 -it --rm -v /home/cjy:/code --ipc=host --network host docker.local/2018140256/dockerfile/road_extraction:torch1.0-cuda10-cudnn7-tensorboard
11.下面这个是tensorflow和cudnn版本兼容的系列错误,懂得都懂,我就瞎写,不懂的就用torch吧
ERROR: tensorflow-gpu 1.14.0 has requirement tensorboard<1.15.0,>=1.14.0, but you’ll have tensorboard 1.13.1 which is incompatible.
ERROR: tensorflow-gpu 1.14.0 has requirement tensorflow-estimator<1.15.0rc0,>=1.14.0rc0, but you’ll have tensorflow-estimator 1.13.0 which is incompatible.
查看cudnn环境
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory
https://github.com/tensorflow/tensorflow/issues/20271
Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
https://github.com/tensorflow/tensorflow/issues/24828
直接换tensorflow的版本以及cudnn的版本不对
12.报错:bool value of Tensor with more than one value is ambiguous
解决:
loss_function=nn.MSELoss #错误
loss_function=nn.MSELoss()#正确
13.报错:SyntaxError: non-default argument follows default argument
解决:将带默认值的放在无默认值的前面,换一下位置就行了
14.报错:RuntimeError: copy_if failed to synchronize: device-side assert triggered
解决:标签设置为0-1
有记得的就这么多,剩下不记得的遇到了再说。
====================问:如何能让自己心态好一点?=