想自己调参不走捷径的话请忽略以下内容
===================== SPOILER ALERT!!! ========================
我来分享一下我以前自己train cifar10的经验。 调参的话基本上根据resnet和densenet两篇paper。 我之前用最大的densenet的话val 可以到96.5%-96.7% 不过8卡titan x train了5天。用小一点的网络也可以上95% 基本上网越大分越高
先说data aug
标准做法是把32x32的image pad成40x40, 然后做32x32的random crop和random mirror
再说batch size,weight decay 和learning rate
resnet 的话 lr 一般设0.1 batchsize 设128 weight decay设5e-4. densenet lr设0.1 batchsize设64 weight decay设1e-4. lr decay=0.1. lr decay epoch分别为150 和225。总共train 300 epoch
网络结构
标准的cifar resnet channel 宽度一般为16, 冲高分的话可以调到64.
优化算法
一般使用动量SGD。像adam和adagrad等自适应优化算法虽然降低了调参难度但是一般最后的测试成绩不如手动调的标准动量SGD
关于一开始使用高learning rate除了李沐老师的解释之外还有一种解释是一开始使用小的learning rate容易收敛到深而窄的局部最优解,而一开始使用高learning rate一般会收敛到深而宽的局部最优解。虽然两者最后的训练loss可能差不多,但是后者的测试loss会比较高。具体来说深而宽的解对参数敏感度比较低,而测试数据的loss surface往往和训练数据不一样所以它们的局部最优解肯定是不一样的。这个时候对参数不敏感的解就会有优势,如果两个训练解距离测试最优解都是delta_w,因为深而窄的解对w的敏感度很高所以测试loss会比训练loss高很多,而对不敏感的解的测试loss相对于训练loss并不会增长很多。当然这个前提是假设训练和测试loss function 关于参数的敏感度(一阶导)是类似的。
常用的一个比喻是把整个优化过程下山的过程。一开始的话基本上是在珠峰,然后一开始就小learning rate的话你可能就直接掉到一个是青藏高原的一个山谷,如果一开始learning rate超高你可能就动能太大冲出地球引力飞到太空去了(这个比喻好像不太恰当0.0),如果一开始learning rate比较高你动能比较高就从珠峰一路跳,从一个山头跳到另一个山头。然后跳了很久之后你可能就跳到柴达木盆地~ 这个时候因为你的动能还是很大所以你就在柴达木盆地四周的山腰跳来跳去,这个时候你把learning rate decay一下 动能小了 就掉到盆地里面了。 虽然两个海拔都很高,但是盆地泛化能力一般比较强(不过最近有论文反驳这一说法 https://arxiv.org/pdf/1703.04933.pdf 3 Sharp Minima Can Generalize For Deep Nets)
===============================================================
祝大家调参顺利!
一开始我只是习惯性的将Epoch
从1提升至100,然后训练提交,我看到结果中第80轮和第79轮结果差距挺大的,但不知道其中的奥义之所在,所以在回复中写了出来
然后得到了楼主 @astonzhang 的提示,还有网友 @fiercex 的补充
[最高0.9772! @PistonY,分数持续更新中] 动手玩Kaggle比赛------使用Gluon对原始图像文件分类(CIFAR-10) 讨论区
再读下代码想想哈~ 为啥到了epoch80会再开窍一次?
[最高0.9772! @PistonY,分数持续更新中] 动手玩Kaggle比赛------使用Gluon对原始图像文件分类(CIFAR-10) 讨论区
因为学习率降低了,训练到一定程度,学习率就会有点高,降低后会再开窍一次,但下降速度会慢一下,具体可以看看梯度下降的原理
于是我做了一些尝试,改变学习率变化的时机频率以及次数
我的想法是这样的,首先在训练一定程度之后,降低学习率能使它开窍;那有几种方法,训练充足之后将学习率降的特别小,不过个人电脑根本背负不起这么变态的运算量;要不就训练次数少一点,学习率降低的幅度小一点但间隔也短了不少,看看会不会有奇效,多开窍几次说不定可以把Epoch
压下来,节约点时间
下面是结果是在官方教程上修改的http://zh.gluon.ai/chapter_computer-vision/kaggle-gluon-cifar10.html 43
仅仅修改了学习率下降的方式,迭代次数
在train()
函数中修改这部分代码能实现
FireShot Capture 10 - KaggleCIFAR10_ - http___localhost_8888_notebooks_Gl.png888×201 57.1 KB
修改之后,初始学习率还是0.1,第20轮 和 第40轮 学习率自乘0.5,40轮之后每10轮学习率自乘0.5
训练之后得到如下结果:
Epoch 0. Loss: 1.780838, Train acc 0.346073, Valid acc 0.435156, Time 00:01:03, lr 0.1
Epoch 1. Loss: 1.231638, Train acc 0.554236, Valid acc 0.570703, Time 00:01:07, lr 0.1
Epoch 2. Loss: 0.898192, Train acc 0.682050, Valid acc 0.611914, Time 00:01:07, lr 0.1
Epoch 3. Loss: 0.717537, Train acc 0.751033, Valid acc 0.738867, Time 00:01:07, lr 0.1
Epoch 4. Loss: 0.636119, Train acc 0.778902, Valid acc 0.755273, Time 00:01:07, lr 0.1
Epoch 5. Loss: 0.570642, Train acc 0.804044, Valid acc 0.759570, Time 00:01:07, lr 0.1
Epoch 6. Loss: 0.532443, Train acc 0.815546, Valid acc 0.782813, Time 00:01:07, lr 0.1
Epoch 7. Loss: 0.501448, Train acc 0.826912, Valid acc 0.754687, Time 00:01:07, lr 0.1
Epoch 8. Loss: 0.474636, Train acc 0.838339, Valid acc 0.789453, Time 00:01:09, lr 0.1
Epoch 9. Loss: 0.454704, Train acc 0.843057, Valid acc 0.787695, Time 00:01:07, lr 0.1
Epoch 10. Loss: 0.434828, Train acc 0.850221, Valid acc 0.692969, Time 00:01:07, lr 0.1
Epoch 11. Loss: 0.432877, Train acc 0.848741, Valid acc 0.800195, Time 00:01:07, lr 0.1
Epoch 12. Loss: 0.415732, Train acc 0.856685, Valid acc 0.681641, Time 00:01:07, lr 0.1
Epoch 13. Loss: 0.402307, Train acc 0.861363, Valid acc 0.753516, Time 00:01:07, lr 0.1
Epoch 14. Loss: 0.390425, Train acc 0.865846, Valid acc 0.756641, Time 00:01:07, lr 0.1
Epoch 15. Loss: 0.381881, Train acc 0.869779, Valid acc 0.833203, Time 00:01:07, lr 0.1
Epoch 16. Loss: 0.378939, Train acc 0.869602, Valid acc 0.792969, Time 00:01:07, lr 0.1
Epoch 17. Loss: 0.365105, Train acc 0.872953, Valid acc 0.808008, Time 00:01:09, lr 0.1
Epoch 18. Loss: 0.360914, Train acc 0.877182, Valid acc 0.799609, Time 00:01:09, lr 0.1
Epoch 19. Loss: 0.357132, Train acc 0.878088, Valid acc 0.764844, Time 00:01:09, lr 0.1
Epoch 20. Loss: 0.230725, Train acc 0.921194, Valid acc 0.863086, Time 00:01:09, lr 0.05
Epoch 21. Loss: 0.199139, Train acc 0.931532, Valid acc 0.864062, Time 00:01:09, lr 0.05
Epoch 22. Loss: 0.201339, Train acc 0.930294, Valid acc 0.861523, Time 00:01:09, lr 0.05
Epoch 23. Loss: 0.195723, Train acc 0.932953, Valid acc 0.832422, Time 00:01:09, lr 0.05
Epoch 24. Loss: 0.206018, Train acc 0.928755, Valid acc 0.850781, Time 00:01:09, lr 0.05
Epoch 25. Loss: 0.202978, Train acc 0.930124, Valid acc 0.841602, Time 00:01:09, lr 0.05
Epoch 26. Loss: 0.207308, Train acc 0.927835, Valid acc 0.829102, Time 00:01:08, lr 0.05
Epoch 27. Loss: 0.197950, Train acc 0.932011, Valid acc 0.861523, Time 00:01:09, lr 0.05
Epoch 28. Loss: 0.203741, Train acc 0.929956, Valid acc 0.833789, Time 00:01:09, lr 0.05
Epoch 29. Loss: 0.192430, Train acc 0.934906, Valid acc 0.843750, Time 00:01:08, lr 0.05
Epoch 30. Loss: 0.190913, Train acc 0.935229, Valid acc 0.848047, Time 00:01:09, lr 0.05
Epoch 31. Loss: 0.187605, Train acc 0.935061, Valid acc 0.843164, Time 00:01:09, lr 0.05
Epoch 32. Loss: 0.190382, Train acc 0.934084, Valid acc 0.831445, Time 00:01:09, lr 0.05
Epoch 33. Loss: 0.197010, Train acc 0.931927, Valid acc 0.824023, Time 00:01:08, lr 0.05
Epoch 34. Loss: 0.187568, Train acc 0.934760, Valid acc 0.849219, Time 00:01:08, lr 0.05
Epoch 35. Loss: 0.179443, Train acc 0.938052, Valid acc 0.853320, Time 00:01:07, lr 0.05
Epoch 36. Loss: 0.181557, Train acc 0.938008, Valid acc 0.849609, Time 00:01:07, lr 0.05
Epoch 37. Loss: 0.179814, Train acc 0.937517, Valid acc 0.855859, Time 00:01:07, lr 0.05
Epoch 38. Loss: 0.177502, Train acc 0.938417, Valid acc 0.839648, Time 00:01:07, lr 0.05
Epoch 39. Loss: 0.181211, Train acc 0.936807, Valid acc 0.833203, Time 00:01:07, lr 0.05
Epoch 40. Loss: 0.088296, Train acc 0.971130, Valid acc 0.886523, Time 00:01:07, lr 0.025
Epoch 41. Loss: 0.050163, Train acc 0.985645, Valid acc 0.899805, Time 00:01:07, lr 0.025
Epoch 42. Loss: 0.039559, Train acc 0.988846, Valid acc 0.897266, Time 00:01:07, lr 0.025
Epoch 43. Loss: 0.034297, Train acc 0.990528, Valid acc 0.894336, Time 00:01:07, lr 0.025
Epoch 44. Loss: 0.036933, Train acc 0.989751, Valid acc 0.869141, Time 00:01:07, lr 0.025
Epoch 45. Loss: 0.049209, Train acc 0.984336, Valid acc 0.874219, Time 00:01:07, lr 0.025
Epoch 46. Loss: 0.064416, Train acc 0.978703, Valid acc 0.875977, Time 00:01:07, lr 0.025
Epoch 47. Loss: 0.069568, Train acc 0.977031, Valid acc 0.865039, Time 00:01:07, lr 0.025
Epoch 48. Loss: 0.077773, Train acc 0.973221, Valid acc 0.875195, Time 00:01:07, lr 0.025
Epoch 49. Loss: 0.082461, Train acc 0.971389, Valid acc 0.862305, Time 00:01:07, lr 0.025
Epoch 50. Loss: 0.038141, Train acc 0.989036, Valid acc 0.905273, Time 00:01:07, lr 0.0125
Epoch 51. Loss: 0.017236, Train acc 0.996338, Valid acc 0.904102, Time 00:01:07, lr 0.0125
Epoch 52. Loss: 0.011381, Train acc 0.998025, Valid acc 0.909961, Time 00:01:07, lr 0.0125
Epoch 53. Loss: 0.007791, Train acc 0.999023, Valid acc 0.907813, Time 00:01:07, lr 0.0125
Epoch 54. Loss: 0.005550, Train acc 0.999534, Valid acc 0.912500, Time 00:01:07, lr 0.0125
Epoch 55. Loss: 0.004742, Train acc 0.999822, Valid acc 0.917188, Time 00:01:07, lr 0.0125
Epoch 56. Loss: 0.003858, Train acc 0.999867, Valid acc 0.917383, Time 00:01:07, lr 0.0125
Epoch 57. Loss: 0.003473, Train acc 0.999933, Valid acc 0.911719, Time 00:01:07, lr 0.0125
Epoch 58. Loss: 0.003152, Train acc 0.999978, Valid acc 0.916602, Time 00:01:07, lr 0.0125
Epoch 59. Loss: 0.002942, Train acc 1.000000, Valid acc 0.913086, Time 00:01:07, lr 0.0125
Epoch 60. Loss: 0.002924, Train acc 1.000000, Valid acc 0.916797, Time 00:01:07, lr 0.00625
Epoch 61. Loss: 0.002724, Train acc 0.999956, Valid acc 0.917578, Time 00:01:07, lr 0.00625
Epoch 62. Loss: 0.002742, Train acc 0.999978, Valid acc 0.916211, Time 00:01:07, lr 0.00625
Epoch 63. Loss: 0.002683, Train acc 1.000000, Valid acc 0.909961, Time 00:01:07, lr 0.00625
Epoch 64. Loss: 0.002623, Train acc 1.000000, Valid acc 0.913672, Time 00:01:07, lr 0.00625
Epoch 65. Loss: 0.002634, Train acc 0.999978, Valid acc 0.914062, Time 00:01:07, lr 0.00625
Epoch 66. Loss: 0.002531, Train acc 1.000000, Valid acc 0.913477, Time 00:01:08, lr 0.00625
Epoch 67. Loss: 0.002656, Train acc 1.000000, Valid acc 0.916211, Time 00:01:08, lr 0.00625
Epoch 68. Loss: 0.002634, Train acc 1.000000, Valid acc 0.914453, Time 00:01:07, lr 0.00625
Epoch 69. Loss: 0.002693, Train acc 0.999978, Valid acc 0.909180, Time 00:01:08, lr 0.00625
Epoch 70. Loss: 0.002682, Train acc 1.000000, Valid acc 0.914453, Time 00:01:08, lr 0.003125
Epoch 71. Loss: 0.002631, Train acc 1.000000, Valid acc 0.918164, Time 00:01:08, lr 0.003125
Epoch 72. Loss: 0.002466, Train acc 1.000000, Valid acc 0.909961, Time 00:01:08, lr 0.003125
Epoch 73. Loss: 0.002510, Train acc 1.000000, Valid acc 0.915234, Time 00:01:08, lr 0.003125
Epoch 74. Loss: 0.002570, Train acc 1.000000, Valid acc 0.909570, Time 00:01:09, lr 0.003125
Epoch 75. Loss: 0.002440, Train acc 1.000000, Valid acc 0.913086, Time 00:01:10, lr 0.003125
Epoch 76. Loss: 0.002482, Train acc 1.000000, Valid acc 0.915820, Time 00:01:10, lr 0.003125
Epoch 77. Loss: 0.002649, Train acc 1.000000, Valid acc 0.912305, Time 00:01:10, lr 0.003125
Epoch 78. Loss: 0.002521, Train acc 1.000000, Valid acc 0.913477, Time 00:01:10, lr 0.003125
Epoch 79. Loss: 0.002476, Train acc 1.000000, Valid acc 0.910156, Time 00:01:11, lr 0.003125
Epoch 80. Loss: 0.002511, Train acc 1.000000, Valid acc 0.914453, Time 00:01:10, lr 0.0015625
Epoch 81. Loss: 0.002558, Train acc 1.000000, Valid acc 0.910352, Time 00:01:11, lr 0.0015625
Epoch 82. Loss: 0.002513, Train acc 1.000000, Valid acc 0.915625, Time 00:01:09, lr 0.0015625
Epoch 83. Loss: 0.002556, Train acc 1.000000, Valid acc 0.912305, Time 00:01:10, lr 0.0015625
Epoch 84. Loss: 0.002566, Train acc 1.000000, Valid acc 0.912109, Time 00:01:11, lr 0.0015625
Epoch 85. Loss: 0.002496, Train acc 1.000000, Valid acc 0.909961, Time 00:01:09, lr 0.0015625
Epoch 86. Loss: 0.002433, Train acc 1.000000, Valid acc 0.914648, Time 00:01:10, lr 0.0015625
Epoch 87. Loss: 0.002547, Train acc 1.000000, Valid acc 0.916406, Time 00:01:10, lr 0.0015625
Epoch 88. Loss: 0.002484, Train acc 1.000000, Valid acc 0.911523, Time 00:01:10, lr 0.0015625
Epoch 89. Loss: 0.002530, Train acc 1.000000, Valid acc 0.917188, Time 00:01:10, lr 0.0015625
Epoch 90. Loss: 0.002449, Train acc 1.000000, Valid acc 0.913281, Time 00:01:10, lr 0.00078125
Epoch 91. Loss: 0.002512, Train acc 1.000000, Valid acc 0.916992, Time 00:01:10, lr 0.00078125
Epoch 92. Loss: 0.002467, Train acc 1.000000, Valid acc 0.914648, Time 00:01:10, lr 0.00078125
Epoch 93. Loss: 0.002399, Train acc 1.000000, Valid acc 0.915625, Time 00:01:10, lr 0.00078125
Epoch 94. Loss: 0.002487, Train acc 1.000000, Valid acc 0.913086, Time 00:01:10, lr 0.00078125
Epoch 95. Loss: 0.002427, Train acc 1.000000, Valid acc 0.912891, Time 00:01:10, lr 0.00078125
Epoch 96. Loss: 0.002533, Train acc 1.000000, Valid acc 0.915430, Time 00:01:09, lr 0.00078125
Epoch 97. Loss: 0.002485, Train acc 1.000000, Valid acc 0.914844, Time 00:01:10, lr 0.00078125
Epoch 98. Loss: 0.002489, Train acc 1.000000, Valid acc 0.907813, Time 00:01:09, lr 0.00078125
Epoch 99. Loss: 0.002485, Train acc 1.000000, Valid acc 0.916406, Time 00:01:08, lr 0.00078125
使用matplotlib
绘图之后得到如下曲线:
FireShot Pro Screen Capture #002 - 'KaggleCIFAR10' - localhost_8888_notebooks_Gluon_Learning_KaggleCIFAR10_ipynb.png693×468 49.2 KB
其实除此之外还有一个设想,只是还没做,就是每一轮都自乘0.934,0.934的100次方接近0.00108
3
创建时间
17年10月
最后回复
17年10月
回复
浏览
用户
赞
链接
4
fiercex版主
17年10月
其实这个就是学习率自动下降了,好像有几个优化算法是带有学习率下降的,但是带自动下降的也有个问题,需要训练很长的时候,学习率可能下降到趋于零,loss会几乎不下降,好像也有泄漏版,就是学习率下降会下降到一个最低值保持不变,具体看看相关优化算法,sgd,adam,之类的。可以看下ng的新课,cs321n等视频
1