DRRG代码踩坑记录

一、代码卡死问题:

    训练一个epoch时,模型不能接着训练,只能通过Ctrl+C强制性暂停,直接卡死,并且输入watch -n 1 nvidia-smi时,观察不到GPU的使用,在经过两个多小时的查找以及各种尝试之后,发现错误居然是num_workers的问题!!!!!!!!!!!!!!

训练之前,一定要指定num_workers为零

CUDA_LAUNCH_BLOCKING=1 python train_TextGraph.py - -exp_name Ctw1500 --max_epoch 600 --batch_size 6 --gpu 0 --input_size 640 --optim SGD --lr 0.001 -- start_epoch 0 --viz --net vgg --num_workers 0

虽然在config.py文件中可以指定num_workers的大小,但如果不指定num_workers的话,num_workers会自动变成8


二、“FileNotFoundError: [Errno 2] No such file or directory: './vis/Ctw1500_train' ”的问题:

这个是训练结果的保存路径,需要手动设置一个空文件夹

三、makefile编译错误:

编译lanms下的Makefile文件,产生如下报错:

g++ -o adaptor.so -I include -std=c++11 -O3 -I/home/zhangmingzhou1/anaconda3/envs/pytorch/include/python3.7m -I/home/zhangmingzhou1/anaconda3/envs/pytorch/include/python3.7m -Wno-unused-result -Wsign-compare -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -fdebug-prefix-map==/usr/local/src/conda/- -fdebug-prefix-map==/usr/local/src/conda-prefix -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -flto -flto -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -DNDEBUG -fwrapv -O3 -Wall -L/home/zhangmingzhou1/anaconda3/envs/pytorch/lib/python3.7/config-3.7m-x86_64-linux-gnu -L/home/zhangmingzhou1/anaconda3/envs/pytorch/lib -lpython3.7m -lpthread -ldl -lutil -lrt -lm -Xlinker -export-dynamic adaptor.cpp include/clipper/clipper.cpp --shared -fPIC

g++: error: unrecognized command line option ‘-fno-plt’

Makefile:10: recipe for target 'adaptor.so' failed

make: *** [adaptor.so] Error 1

将原文件代码:

CXXFLAGS= -I include -std=c++11 -O3$(shellpython3-config --cflags)

LDFLAGS=$(shellpython3-config --ldflags)

DEPS= lanms.h$(shellfind include -xtype f)

CXX_SOURCES= adaptor.cpp include/clipper/clipper.cpp

LIB_SO= adaptor.so

$(LIB_SO):$(CXX_SOURCES)$(DEPS)

$(CXX)-o$@$(CXXFLAGS)$(LDFLAGS)$(CXX_SOURCES)--shared -fPIC

clean:

rm -rf$(LIB_SO)

改为:

CXXFLAGS = -I include -std=c++11 -O3 -I/home/zhangmingzhou1/anaconda3/envs/pytorch/include/python3.7m/

LDFLAGS = -I/home/zhangmingzhou1/anaconda3/envs/pytorch/include/python3.7m/

DEPS = lanms.h $(shell find include -xtype f)

CXX_SOURCES = adaptor.cpp include/clipper/clipper.cpp

LIB_SO = adaptor.so

$(LIB_SO): $(CXX_SOURCES) $(DEPS)

$(CXX) -o $@ $(CXXFLAGS) $(LDFLAGS) $(CXX_SOURCES) --shared -fPIC

clean:

rm -rf $(LIB_SO)

你可能感兴趣的:(DRRG代码踩坑记录)