日常记录沙雕操作

1.2019-5-57:安装好anaconda3找不到conda命令:
echo 'export PATH="/code/anaconda3/bin: P A T H " ′ > >   / . b a s h r c s o u r c e   / . b a s h r c c o n d a – v 2.2019 − 5 − 31 : I m p o r t E r r o r : N o m o d u l e n a m e d ′ n e t s ′ v i m   / . b a s h r c e x p o r t P Y T H O N P A T H = PATH"' >> ~/.bashrc source ~/.bashrc conda –v 2.2019-5-31: ImportError: No module named 'nets' vim ~/.bashrc export PYTHONPATH= PATH">> /.bashrcsource /.bashrccondav2.2019531:ImportError:Nomodulenamednetsvim /.bashrcexportPYTHONPATH=PYTHONPATH:/code/anaconda3/lib/python3.5/site-packages/tensorflow/models/research:/code/anaconda3/lib/python3.5/site-packages/tensorflow/models/research/slim
source ~/.bashrc

3.自己下载数据集:
注释相关操作,直接bash
4.出现找不到什么文件 :直接把文件路径设置为绝对路径就行了,应该能够解决大部分的问题
5.jupyter 密码被莫名其妙修改: jupyter notebook passwd,强行覆盖修改配置文件中的密码
6. ImportError: libSM.so.6: cannot open shared object file: No such file or directory
ImportError: libXrender.so.1: cannot open shared object file: No such file or directory
ImportError: libXext.so.6: cannot open shared object file: No such file or directory
解决:apt-get install libsm6
apt-get install libxrender1
apt-get install libxext-dev

  1. ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory
    apt-get update
    apt-get install libglib2.0-dev
  2. ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
    nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=0,1 -p 7000:7000 -it --rm -v /mnt/cjy/code:/code --ipc=host docker.local/2018140256/dockerfile/road_extraction:torch1.1.0-1
  3. THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument
    I actually just solved it compiling pytorch from source. Might be useful for someone else having the same problem
  4. RuntimeError: cannot join current thread
    安装4.19版本
    11.ValueError: num_samples should be a positive integer value, but got num_samples=0
    没有读进图
    12.查看分盘情况
    lsblk
    ========看到学弟的博客。。。想起来我还有个这个东西,写两句,写两句
    娘诶。。。发现这篇不知道是什么时候的
    开始瞎写来

1.安装h5py :https://stackoverflow.com/questions/29831052/error-importing-h5py
sudo pip install cython
sudo apt-get install libhdf5-dev
sudo pip install h5py

2.ImportError: No module named pywt
pip install PyWavelets

3.服务器的GUI界面远程打开
Xming + putty

4.from pip import main ImportError: cannot import name ‘main’
vi /usr/bin/pip3
from pip import main //这行也要修改
if name == 'main’:
sys.exit(main.main())//增加__main_._

5.安装提速
python -m pip install torch==0.4.0 torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple

6.重启服务 docker
Sudo systemctl daemon-reload
Sudo systemctl restart docker

7.docker: Error response from daemon: create nvidia_driver_430.14: error looking up volume plugin nvidia-docker: plugin “nvidia-docker” not found.
解决 sudo service nvidia-docker start

8.查看磁盘信息
sudo hdparm -I /dev/sda

9.容器网络端口暴露:nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 -it --rm -v /home/cjy:/code --ipc=host --network host docker.local/2018140256/dockerfile/road_extraction:torch1.0-cuda10-cudnn7-tensorboard

  1. RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
    from torch.autograd import Variable as V
    loss = StableBCELoss()(logits, V(labels.float(),requires_grad=True))
    AttributeError: module ‘yaml’ has no attribute ‘FullLoader’
    解决:安个yaml

11.下面这个是tensorflow和cudnn版本兼容的系列错误,懂得都懂,我就瞎写,不懂的就用torch吧
ERROR: tensorflow-gpu 1.14.0 has requirement tensorboard<1.15.0,>=1.14.0, but you’ll have tensorboard 1.13.1 which is incompatible.
ERROR: tensorflow-gpu 1.14.0 has requirement tensorflow-estimator<1.15.0rc0,>=1.14.0rc0, but you’ll have tensorflow-estimator 1.13.0 which is incompatible.
查看cudnn环境
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory
https://github.com/tensorflow/tensorflow/issues/20271
Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
https://github.com/tensorflow/tensorflow/issues/24828
直接换tensorflow的版本以及cudnn的版本不对

12.报错:bool value of Tensor with more than one value is ambiguous
解决:
loss_function=nn.MSELoss #错误
loss_function=nn.MSELoss()#正确

13.报错:SyntaxError: non-default argument follows default argument
解决:将带默认值的放在无默认值的前面,换一下位置就行了

14.报错:RuntimeError: copy_if failed to synchronize: device-side assert triggered
解决:标签设置为0-1

有记得的就这么多,剩下不记得的遇到了再说。

====================问:如何能让自己心态好一点?=

你可能感兴趣的:(日常记录沙雕操作)