仿真代码:https://github.com/yunjey/show-attend-and-tell
一. 依赖:
numpy,matplotlib,scipy,scikit-image,hickle,Pillow
二. 下载 the MSCOCO image dataset and VGGNet19 model,放于相应的位置,
并 python resize.py
三.运行 python prepro.py
1.pip2 install hickle
2.pip2 install pandas
3.问题:
hudou@Amax-Super-Server:~/仿真/show,attend,and tell/show-attend-and-tell-master/show-attend-and-tell-master$ python prepro.py
Traceback (most recent call last):
File "prepro.py", line 212, in
main()
File "prepro.py", line 138, in main
max_length=max_length)
File "prepro.py", line 15, in _process_caption_data
with open(caption_file) as f:
IOError: [Errno 2] No such file or directory: 'data/annotations/captions_train2014.json'
解决:
将annotations文件夹,放置于data文件夹下
4.问题:tensorflow.python.framework.errors_impl.ResourceExhaustedError:
解决:
首先百度问题以及相关的解决方法;
然后通过看源码,并加入输出语句定位问题的具体行数;
训练错误的地方应该是:
with tf.Session() as sess:
或者是:
由tf.initialize_all_variables().run()修改为的:
tf.global_variables_initializer().run()
训练的报错是:
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError:
5.最终运行成功:(截取部分结果)
(略)
Processed 82744 train features..
Processed 82752 train features..
Processed 82760 train features..
Processed 82768 train features..
Processed 82776 train features..
Processed 82784 train features..
Saved ./data/train/train.features.hkl..
doudou2
doudou3
Loaded ./data/val/val.annotations.pkl..
Processed 8 val features..
Processed 16 val features..
Processed 24 val features..
Processed 32 val features..
(略)
Processed 4016 val features..
Processed 4024 val features..
Processed 4032 val features..
Processed 4040 val features..
Processed 4048 val features..
Processed 4056 val features..
Saved ./data/val/val.features.hkl..
doudou2
doudou3
Loaded ./data/test/test.annotations.pkl..
(略)
Processed 3928 test features..
Processed 3936 test features..
Processed 3944 test features..
Processed 3952 test features..
Processed 3960 test features..
Processed 3968 test features..
Processed 3976 test features..
Processed 3984 test features..
Processed 3992 test features..
Processed 4000 test features..
Processed 4008 test features..
Processed 4016 test features..
Processed 4024 test features..
Processed 4032 test features..
Processed 4040 test features..
Processed 4048 test features..
Saved ./data/test/test.features.hkl..
四. 运行 python train.py
1. 错误:MemoryError
解决方案1:查看python位数,结果为64位的。
原因是“后来才知道32bit的Python使用内存超过2G之后,就报这个错误,还没有其他的提示消息。果断换64bit的Python。”、
hudou@Amax-Super-Server:~/仿真/show,attend,and tell/show-attend-and-tell-master/show-attend-and-tell-master$ python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.architecture()
('64bit', 'ELF')
解决方案2:换一个更大内存的服务器跑程序。
2.错误:tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
解决:
3.最终运行结果:
$ conda uninstall tensorflow-gpu
Solving environment: done
## Package Plan ##
environment location: /home/syh-lld/anaconda2
removed specs:
- tensorflow-gpu
The following packages will be REMOVED:
tensorflow-gpu: 1.3.0-0
Proceed ([y]/n)? t^Hy^H^H^H
Invalid choice: y
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
$ python
Python 2.7.15 |Anaconda, Inc.| (default, Oct 23 2018, 18:31:10)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> exit()
$ ls
anaconda2 numpy-1.15.3+mkl-cp35-cp35m-win_amd64.whl scipy-1.1.0-cp35-cp35m-win_amd64.whl ttf.py
Anaconda2-5.2.0-Linux-x86_64.sh perl5 show-attend-and-tell-master
$ cd show-attend-and-tell-master/
[syh-lld@localhost show-attend-and-tell-master]$ python train.py
doudou1
image_idxs
file_names word_to_idx features captions Elapse time: 321.70 doudou2 doudou3 image_idxs file_names features captions Elapse time: 5.90 doudou4 doudou5 doudou6 The number of epoch: 20 Data size: 82783 Batch size: 128 Iterations per epoch: 647 doudou solver1 doudou solver2 doudou solver3 doudou solver4 2018-11-06 09:12:23.515298: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2018-11-06 09:12:23.515446: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2018-11-06 09:12:23.515465: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2018-11-06 09:12:23.515479: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2018-11-06 09:12:23.515487: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX512F instructions, but these are available on your machine and could speed up CPU computations. 2018-11-06 09:12:23.515495: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2018-11-06 09:12:26.675600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate (GHz) 1.3285 pciBusID 0000:18:00.0 Total memory: 15.90GiB Free memory: 15.61GiB 2018-11-06 09:12:26.846029: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x7efdc1e71d90 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that. 2018-11-06 09:12:26.846791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 1 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate (GHz) 1.3285 pciBusID 0000:3b:00.0 Total memory: 15.90GiB Free memory: 15.61GiB 2018-11-06 09:12:27.024220: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x7efdc1e76290 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that. 2018-11-06 09:12:27.025025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 2 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate (GHz) 1.3285 pciBusID 0000:86:00.0 Total memory: 15.90GiB Free memory: 15.61GiB 2018-11-06 09:12:27.028106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 1 2 2018-11-06 09:12:27.028131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y Y Y 2018-11-06 09:12:27.028156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 1: Y Y Y 2018-11-06 09:12:27.028165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 2: Y Y Y 2018-11-06 09:12:27.028183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:18:00.0) 2018-11-06 09:12:27.028194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:3b:00.0) 2018-11-06 09:12:27.028226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0) doudou solver5 Previous epoch loss: -1 Current epoch loss: 29305.9548035 Elapsed time: 214.652005911 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 42184, 'guess': [42168, 38116, 34064, 30012], 'testlen': 42168, 'correct': [25418, 9016, 3129, 1130]} ratio: 0.999620709274 Bleu_1: 0.602550686771 Bleu_2: 0.377457180992 Bleu_3: 0.235627968509 Bleu_4: 0.148961764851 METEOR: 0.173930289839 ROUGE_L: 0.475022252194 CIDEr: 0.414265379648 model-1 saved. Previous epoch loss: 29305.9548035 Current epoch loss: 22278.5405769 Elapsed time: 459.680169106 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43639, 'guess': [44290, 40238, 36186, 32134], 'testlen': 44290, 'correct': [26873, 10131, 3754, 1466]} ratio: 1.01491784871 Bleu_1: 0.606750959585 Bleu_2: 0.390852775646 Bleu_3: 0.251184804708 Bleu_4: 0.163978654222 METEOR: 0.186843523668 ROUGE_L: 0.483651040966 CIDEr: 0.48868324339 model-2 saved. Previous epoch loss: 22278.5405769 Current epoch loss: 20547.1464901 Elapsed time: 704.346327066 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 42034, 'guess': [41897, 37845, 33793, 29741], 'testlen': 41897, 'correct': [26813, 10456, 3989, 1619]} ratio: 0.996740733692 Bleu_1: 0.637884973222 Bleu_2: 0.419121233224 Bleu_3: 0.274430368162 Bleu_4: 0.182996135806 METEOR: 0.19342309027 ROUGE_L: 0.497202786627 CIDEr: 0.542882914301 model-3 saved. Previous epoch loss: 20547.1464901 Current epoch loss: 19433.2154102 Elapsed time: 954.214054108 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43128, 'guess': [43609, 39557, 35505, 31453], 'testlen': 43609, 'correct': [27605, 10822, 4282, 1818]} ratio: 1.01115284734 Bleu_1: 0.633011534316 Bleu_2: 0.41614808733 Bleu_3: 0.275391786848 Bleu_4: 0.186400120568 METEOR: 0.197035137185 ROUGE_L: 0.497974545922 CIDEr: 0.558730597535 model-4 saved. Previous epoch loss: 19433.2154102 Current epoch loss: 18526.9098015 Elapsed time: 1207.98046112 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43134, 'guess': [43565, 39513, 35461, 31409], 'testlen': 43565, 'correct': [27483, 10821, 4214, 1794]} ratio: 1.00999211759 Bleu_1: 0.630850453346 Bleu_2: 0.415649158919 Bleu_3: 0.273820392256 Bleu_4: 0.185050994529 METEOR: 0.198713280552 ROUGE_L: 0.497641481413 CIDEr: 0.562680300735 model-5 saved. Previous epoch loss: 18526.9098015 Current epoch loss: 17747.3153801 Elapsed time: 1461.94950008 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 42817, 'guess': [43126, 39074, 35022, 30970], 'testlen': 43126, 'correct': [27449, 10672, 4183, 1782]} ratio: 1.0072167597 Bleu_1: 0.63648379168 Bleu_2: 0.416939121048 Bleu_3: 0.274851052675 Bleu_4: 0.185915105362 METEOR: 0.199935677598 ROUGE_L: 0.498579121206 CIDEr: 0.575167425315 model-6 saved. Previous epoch loss: 17747.3153801 Current epoch loss: 17054.8211136 Elapsed time: 1718.52202606 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 42963, 'guess': [43326, 39274, 35222, 31170], 'testlen': 43326, 'correct': [27590, 10806, 4273, 1836]} ratio: 1.00844913065 Bleu_1: 0.636800073859 Bleu_2: 0.418582884333 Bleu_3: 0.277009037023 Bleu_4: 0.188106768102 METEOR: 0.201374404101 ROUGE_L: 0.498718791431 CIDEr: 0.582224547845 model-7 saved. Previous epoch loss: 17054.8211136 Current epoch loss: 16399.4807281 Elapsed time: 1976.14774799 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43216, 'guess': [43630, 39578, 35526, 31474], 'testlen': 43630, 'correct': [27890, 10899, 4274, 1782]} ratio: 1.00957978526 Bleu_1: 0.639239055696 Bleu_2: 0.419563843158 Bleu_3: 0.276669597761 Bleu_4: 0.186084422426 METEOR: 0.202627867191 ROUGE_L: 0.501075426782 CIDEr: 0.585386686232 model-8 saved. Previous epoch loss: 16399.4807281 Current epoch loss: 15803.3446693 Elapsed time: 2277.28461504 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43504, 'guess': [44049, 39997, 35945, 31893], 'testlen': 44049, 'correct': [27872, 10991, 4428, 1919]} ratio: 1.01252758367 Bleu_1: 0.632749892166 Bleu_2: 0.416985482225 Bleu_3: 0.277717767002 Bleu_4: 0.189473138484 METEOR: 0.20328816434 ROUGE_L: 0.499081444887 CIDEr: 0.589336354149 model-9 saved. Previous epoch loss: 15803.3446693 Current epoch loss: 15266.4076843 Elapsed time: 2623.26851106 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43282, 'guess': [43838, 39786, 35734, 31682], 'testlen': 43838, 'correct': [27919, 10871, 4269, 1766]} ratio: 1.01284598678 Bleu_1: 0.636867557827 Bleu_2: 0.417151848051 Bleu_3: 0.274965130038 Bleu_4: 0.18450270757 METEOR: 0.202498495479 ROUGE_L: 0.496300028682 CIDEr: 0.586579838358 model-10 saved. Previous epoch loss: 15266.4076843 Current epoch loss: 14758.2014503 Elapsed time: 2970.49850702 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43673, 'guess': [44299, 40247, 36195, 32143], 'testlen': 44299, 'correct': [27861, 10850, 4294, 1798]} ratio: 1.01433379891 Bleu_1: 0.628930675636 Bleu_2: 0.41176506846 Bleu_3: 0.271959449714 Bleu_4: 0.183148739946 METEOR: 0.201822437783 ROUGE_L: 0.494693773453 CIDEr: 0.579149289517 model-11 saved. Previous epoch loss: 14758.2014503 Current epoch loss: 14267.3956318 Elapsed time: 3315.46887302 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43853, 'guess': [44701, 40649, 36597, 32545], 'testlen': 44701, 'correct': [28122, 11024, 4319, 1822]} ratio: 1.01933733154 Bleu_1: 0.629113442652 Bleu_2: 0.413056206166 Bleu_3: 0.272052052766 Bleu_4: 0.183233564269 METEOR: 0.203701823272 ROUGE_L: 0.495619352763 CIDEr: 0.581331361356 model-12 saved. Previous epoch loss: 14267.3956318 Current epoch loss: 13846.7553272 Elapsed time: 3570.82882404 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43610, 'guess': [44177, 40125, 36073, 32021], 'testlen': 44177, 'correct': [27736, 10689, 4193, 1793]} ratio: 1.01300160514 Bleu_1: 0.627838015257 Bleu_2: 0.408963755299 Bleu_3: 0.268887582737 Bleu_4: 0.181641219887 METEOR: 0.199948972508 ROUGE_L: 0.493568452575 CIDEr: 0.564441595541 model-13 saved. Previous epoch loss: 13846.7553272 Current epoch loss: 13445.7363987 Elapsed time: 3853.63303208 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 44024, 'guess': [44858, 40806, 36754, 32702], 'testlen': 44858, 'correct': [28126, 10876, 4224, 1796]} ratio: 1.01894421225 Bleu_1: 0.627000757947 Bleu_2: 0.408795983157 Bleu_3: 0.267800168549 Bleu_4: 0.180215084181 METEOR: 0.202896824539 ROUGE_L: 0.493999194508 CIDEr: 0.577149314451 model-14 saved. Previous epoch loss: 13445.7363987 Current epoch loss: 13069.1123486 Elapsed time: 4180.21096301 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43687, 'guess': [44469, 40417, 36365, 32313], 'testlen': 44469, 'correct': [27803, 10630, 4141, 1726]} ratio: 1.0179000618 Bleu_1: 0.625222064809 Bleu_2: 0.405510163176 Bleu_3: 0.265547188943 Bleu_4: 0.177837014538 METEOR: 0.200393595569 ROUGE_L: 0.491417011836 CIDEr: 0.566795262572 model-15 saved. Previous epoch loss: 13069.1123486 Current epoch loss: 12725.6176128 Elapsed time: 4529.67197204 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43785, 'guess': [44550, 40498, 36446, 32394], 'testlen': 44550, 'correct': [27901, 10637, 4125, 1734]} ratio: 1.0174717369 Bleu_1: 0.626285072952 Bleu_2: 0.405582139608 Bleu_3: 0.265039482762 Bleu_4: 0.177676119416 METEOR: 0.200075432061 ROUGE_L: 0.491235799711 CIDEr: 0.563757353526 model-16 saved. Previous epoch loss: 12725.6176128 Current epoch loss: 12390.736412 Elapsed time: 4869.4416821 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 43819, 'guess': [44554, 40502, 36450, 32398], 'testlen': 44554, 'correct': [27847, 10624, 4130, 1728]} ratio: 1.01677354572 Bleu_1: 0.625016833505 Bleu_2: 0.404903614359 Bleu_3: 0.264841034328 Bleu_4: 0.177417044699 METEOR: 0.199895036488 ROUGE_L: 0.490030602975 CIDEr: 0.56541891834 model-17 saved. Previous epoch loss: 12390.736412 Current epoch loss: 12098.9043007 Elapsed time: 5219.3836751 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 44014, 'guess': [44892, 40840, 36788, 32736], 'testlen': 44892, 'correct': [27890, 10771, 4140, 1772]} ratio: 1.0199481983 Bleu_1: 0.621268822953 Bleu_2: 0.404785480606 Bleu_3: 0.264188964071 Bleu_4: 0.177744237748 METEOR: 0.200448868078 ROUGE_L: 0.491381116267 CIDEr: 0.567569147112 model-18 saved. Previous epoch loss: 12098.9043007 Current epoch loss: 11815.7490501 Elapsed time: 5556.38881493 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 44335, 'guess': [45260, 41208, 37156, 33104], 'testlen': 45260, 'correct': [27920, 10681, 4243, 1808]} ratio: 1.0208638773 Bleu_1: 0.616880247459 Bleu_2: 0.399867052355 Bleu_3: 0.263324806281 Bleu_4: 0.177704456082 METEOR: 0.200687106953 ROUGE_L: 0.488475624684 CIDEr: 0.558739318793 model-19 saved. Previous epoch loss: 11815.7490501 Current epoch loss: 11564.9048023 Elapsed time: 5898.09821701 Saved ./data/val/val.candidate.captions.pkl.. {'reflen': 44357, 'guess': [45297, 41245, 37193, 33141], 'testlen': 45297, 'correct': [27995, 10665, 4154, 1794]} ratio: 1.02119169466 Bleu_1: 0.618032099256 Bleu_2: 0.399760879508 Bleu_3: 0.261337634228 Bleu_4: 0.1763054242 METEOR: 0.200651578955 ROUGE_L: 0.490162182924 CIDEr: 0.559925918067 model-20 saved. doudou7 五.$ tensorboard --logdir='./log' --port=6005 问题:can't import named main 暂时未解决 六.评估evaluate_model.ipynb 1.运行ipynb文件: 或 运行 $ jupyter notebook 2.结果: 参考博客(不限于以下几篇): https://blog.csdn.net/spring_willow/article/details/80143207 https://www.cnblogs.com/wolflzc/p/9117291.html https://blog.csdn.net/u012177034/article/details/61614497
安装 ipython notebook:$ pip ipython
运行 Ipython NoteBook:$ ipython notebook