tensorflow1.4(GPU版本)
安装tensorflow-gpu1.4可以执行如下指令
pip install tensorflow-gpu==1.4.0
但是网速慢,迅雷下载
https://pypi.tuna.tsinghua.edu.cn/packages/0a/0d/1a52e775e490f2fcb0eba08b3df773e6e6d64934c77346b351f6df2ed8df/tensorflow_gpu-1.4.0-cp35-cp35m-win_amd64.whl
安装.whl文件
cd到下载文件夹
pip install tensorflow_gpu-1.4.0-cp35-cp35m-win_amd64.whl
安装python依赖库(cython, python-opencv, easydict)
注意还是要在TensorFlow虚拟环境下安装
activate tensorflow
pip install cython
pip install opencv-python==3.4.4.19
pip install easydict
error: Unable to find vcvarsall.bat
原因是没有找到vcvarsall.bat。查找vcvarsall.bat的方法是定义在_msvccompiler.py文件中的(注意该文件前面是有下划线的!),比如我本地的文件路径为
"C:\Users\Lee\AppData\Local\conda\conda\envs\tfgpu\Lib\distutils_msvccompiler.py“
打开该文件,修改函数_find_vcvarsall。我本地安装的是vs2017,vcvarsall.bat的路径为
“C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat”
修改_find_vcvarsall为:
def _find_vcvarsall(plat_spec):
best_dir = r'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build'
best_version = 17
vcruntime = None
vcruntime_spec = _VCVARS_PLAT_TO_VCRUNTIME_REDIST.get(plat_spec)
if vcruntime_spec:
vcruntime = os.path.join(best_dir,
vcruntime_spec.format(best_version))
if not os.path.isfile(vcruntime):
log.debug("%s cannot be found", vcruntime)
vcruntime = None
print(vcruntime)
return r'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat', vcruntime
cd ..\Faster-RCNN-TensorFlow-Python3.5-master\data\coco\PythonAPI
python setup.py build_ext --inplace
python setup.py build_ext install
返回到
C:\Users\Lee\Documents\Faster-RCNN-tfGPU-Python3.5\lib\utils
python setup.py build_ext --inplace
训练模型
在faster rcnn目录下运行命令:
python train.py
Download and install CUDA 8.0
直接百度搜索cuda8.0下载 https://developer.nvidia.com/cuda-80-ga2-download-archive
安装cuddn V6
也是从NVIDIA官网进行下载,要选择与cuda8.0 和windows匹配的版本
配环境变量
测试安装
运行
python train.py
Loaded dataset `voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from C:\Users\Lee\Documents\Faster-RCNN-tfGPU-Python3.5\data\cache\voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
2020-04-08 20:28:59.405220: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-04-08 20:28:59.564827: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce RTX 2060 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.65
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.59GiB
2020-04-08 20:28:59.565021: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce RTX 2060 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
C:\Users\Lee\AppData\Local\conda\conda\envs\tfgpu\lib\site-packages\tensorflow\python\ops\gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Fix VGG16 layers..
Fixed.
iter: 10 / 5000, total loss: 2.466144
>>> rpn_loss_cls: 0.345931
>>> rpn_loss_box: 0.082256
>>> loss_cls: 1.445962
>>> loss_box: 0.591996
speed: 21.156s / iter
iter: 20 / 5000, total loss: 1.434892
>>> rpn_loss_cls: 0.245481
>>> rpn_loss_box: 0.111026
>>> loss_cls: 0.638178
>>> loss_box: 0.440207
speed: 10.333s / iter
iter: 30 / 5000, total loss: 1.058773
>>> rpn_loss_cls: 0.205208
>>> rpn_loss_box: 0.022668
>>> loss_cls: 0.570207
>>> loss_box: 0.260690
speed: 6.935s / iter
iter: 40 / 5000, total loss: 0.473153
>>> rpn_loss_cls: 0.238226
>>> rpn_loss_box: 0.004898
>>> loss_cls: 0.230029
>>> loss_box: 0.000000
speed: 5.296s / iter
iter: 50 / 5000, total loss: 0.845320
>>> rpn_loss_cls: 0.081282
>>> rpn_loss_box: 0.023834
>>> loss_cls: 0.380387
>>> loss_box: 0.359818
speed: 4.336s / iter
iter: 60 / 5000, total loss: 0.849760
>>> rpn_loss_cls: 0.136580
>>> rpn_loss_box: 0.412981
>>> loss_cls: 0.176427
>>> loss_box: 0.123772
speed: 3.691s / iter
iter: 70 / 5000, total loss: 2.587550
>>> rpn_loss_cls: 0.693134
>>> rpn_loss_box: 0.221948
>>> loss_cls: 0.956087
>>> loss_box: 0.716381
speed: 3.235s / iter
iter: 80 / 5000, total loss: 0.785885
>>> rpn_loss_cls: 0.590533
>>> rpn_loss_box: 0.195291
>>> loss_cls: 0.000061
>>> loss_box: 0.000000
speed: 2.883s / iter
iter: 90 / 5000, total loss: 0.891631
>>> rpn_loss_cls: 0.218121
>>> rpn_loss_box: 0.018311
>>> loss_cls: 0.393936
>>> loss_box: 0.261263
speed: 2.610s / iter
iter: 100 / 5000, total loss: 1.174571
>>> rpn_loss_cls: 0.251819
>>> rpn_loss_box: 0.037423
>>> loss_cls: 0.607602
>>> loss_box: 0.277726
speed: 2.393s / iter
iter: 110 / 5000, total loss: 0.621391
>>> rpn_loss_cls: 0.173873
>>> rpn_loss_box: 0.001472
>>> loss_cls: 0.308855
>>> loss_box: 0.137191
speed: 2.216s / iter
iter: 120 / 5000, total loss: 0.544772
>>> rpn_loss_cls: 0.026718
>>> rpn_loss_box: 0.321147
>>> loss_cls: 0.119774
>>> loss_box: 0.077133
speed: 2.077s / iter
iter: 130 / 5000, total loss: 3.637077
>>> rpn_loss_cls: 1.001637
>>> rpn_loss_box: 0.386988
>>> loss_cls: 1.063948
>>> loss_box: 1.184504
speed: 1.960s / iter
iter: 140 / 5000, total loss: 0.788855
>>> rpn_loss_cls: 0.140763
>>> rpn_loss_box: 0.041013
>>> loss_cls: 0.406340
>>> loss_box: 0.200739
speed: 1.850s / iter
iter: 150 / 5000, total loss: 2.527586
>>> rpn_loss_cls: 0.461581
>>> rpn_loss_box: 0.074859
>>> loss_cls: 0.710593
>>> loss_box: 1.280552
speed: 1.761s / iter
iter: 160 / 5000, total loss: 0.965645
>>> rpn_loss_cls: 0.135997
>>> rpn_loss_box: 0.008201
>>> loss_cls: 0.450770
>>> loss_box: 0.370678
speed: 1.688s / iter
iter: 170 / 5000, total loss: 1.615126
>>> rpn_loss_cls: 0.285918
>>> rpn_loss_box: 0.077910
>>> loss_cls: 0.520531
>>> loss_box: 0.730767
speed: 1.623s / iter
iter: 180 / 5000, total loss: 0.925963
>>> rpn_loss_cls: 0.079161
>>> rpn_loss_box: 0.003428
>>> loss_cls: 0.500542
>>> loss_box: 0.342832
speed: 1.568s / iter
iter: 190 / 5000, total loss: 0.963700
>>> rpn_loss_cls: 0.171862
>>> rpn_loss_box: 0.192049
>>> loss_cls: 0.315864
>>> loss_box: 0.283926
speed: 1.509s / iter
iter: 200 / 5000, total loss: 1.232456
>>> rpn_loss_cls: 0.047470
>>> rpn_loss_box: 0.022191
>>> loss_cls: 0.629571
>>> loss_box: 0.533224
speed: 1.456s / iter
iter: 210 / 5000, total loss: 0.649945
>>> rpn_loss_cls: 0.196420
>>> rpn_loss_box: 0.009324
>>> loss_cls: 0.249450
>>> loss_box: 0.194751
speed: 1.413s / iter
.
.
.
.
.
speed: 0.529s / iter
iter: 4970 / 5000, total loss: 0.588941
>>> rpn_loss_cls: 0.128594
>>> rpn_loss_box: 0.053865
>>> loss_cls: 0.249478
>>> loss_box: 0.157004
speed: 0.529s / iter
iter: 4980 / 5000, total loss: 0.286666
>>> rpn_loss_cls: 0.036725
>>> rpn_loss_box: 0.055990
>>> loss_cls: 0.131714
>>> loss_box: 0.062237
speed: 0.529s / iter
iter: 4990 / 5000, total loss: 0.391392
>>> rpn_loss_cls: 0.091277
>>> rpn_loss_box: 0.162543
>>> loss_cls: 0.111782
>>> loss_box: 0.025790
speed: 0.529s / iter
iter: 5000 / 5000, total loss: 0.758708
>>> rpn_loss_cls: 0.182317
>>> rpn_loss_box: 0.012224
>>> loss_cls: 0.237723
>>> loss_box: 0.326444
speed: 0.528s / iter
Wrote snapshot to: C:\Users\Lee\Documents\Faster-RCNN-tfGPU-Python3.5\default\voc_2007_trainval\default\vgg16_faster_rcnn_iter_5000.ckpt
Process finished with exit code 0
可以看到,GPU开始慢,后面越来越快,是CPU的一倍以上
CPU speed: 2.721s / iter
CPU i7 9700
Demo for data/demo/000456.jpg
Detection took 165.314s for 300 object proposals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Demo for data/demo/000457.jpg
Detection took 0.136s for 300 object proposals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Demo for data/demo/000542.jpg
Detection took 0.134s for 300 object proposals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Demo for data/demo/001150.jpg
Detection took 0.134s for 300 object proposals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Demo for data/demo/001763.jpg
Detection took 0.135s for 300 object proposals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Demo for data/demo/004545.jpg
Detection took 0.122s for 300 object proposals