制作4个网络,分别是3层,5层,7层,9层在迭代终止标准相同的前提下统计分类准确率比较增加网络层数是否一定可以改善网络性能?
3层网络的结构是
(mnist 0 ,mnist 2)81-30-2-(1,0) || (0,1)
分类mnist的0和2,将28*28的图片压缩到9*9,三层网络的节点数量分别是81,30,2。让0向(1,0)收敛,让2向(0,1)收敛,网络的迭代停止的标准是
|输出函数-目标函数|<δ
让δ=0.5到1e-6的34个值,每个δ重复收敛199次,统计迭代次数平均值,分类准确率平均值,分类准确率最大值,迭代时间平均值。
与之对应的5层,7层,9层网络的结构是
(mnist 0 ,mnist 2)81-30-49-30-2-(1,0) || (0,1)
(mnist 0 ,mnist 2)81-30-49-30-49-30-2-(1,0) || (0,1)
(mnist 0 ,mnist 2)81-30-49-30-49-30-49-30-2-(1,0) || (0,1)
首先比较平均准确率
9 |
7 |
5 |
3 |
|
δ |
平均准确率p-ave |
平均准确率p-ave |
平均准确率p-ave |
平均准确率p-ave |
0.5 |
0.504318 |
0.499258 |
0.5048 |
0.527528 |
0.4 |
0.628378 |
0.65161 |
0.650711 |
0.559235 |
0.3 |
0.717866 |
0.751404 |
0.781075 |
0.685667 |
0.2 |
0.768787 |
0.798975 |
0.841833 |
0.797262 |
0.1 |
0.822365 |
0.855835 |
0.90255 |
0.911606 |
0.01 |
0.885606 |
0.914883 |
0.940618 |
0.960129 |
0.001 |
0.964507 |
0.968176 |
0.962292 |
0.975169 |
9.00E-04 |
0.958186 |
0.964025 |
0.962399 |
0.975409 |
8.00E-04 |
0.957866 |
0.966742 |
0.962297 |
0.976148 |
7.00E-04 |
0.959747 |
0.965081 |
0.964744 |
0.976238 |
6.00E-04 |
0.961562 |
0.967584 |
0.96621 |
0.976875 |
5.00E-04 |
0.96402 |
0.970022 |
0.966013 |
0.977357 |
4.00E-04 |
0.964397 |
0.970069 |
0.967576 |
0.977389 |
3.00E-04 |
0.974038 |
0.972639 |
0.96951 |
0.979382 |
2.00E-04 |
0.976892 |
0.976308 |
0.973973 |
0.97974 |
1.00E-04 |
0.981266 |
0.979532 |
0.979585 |
0.981728 |
9.00E-05 |
0.98216 |
0.98187 |
0.979785 |
0.981758 |
8.00E-05 |
0.983049 |
0.980691 |
0.980499 |
0.982055 |
7.00E-05 |
0.98484 |
0.980884 |
0.982065 |
0.982707 |
6.00E-05 |
0.985354 |
0.980816 |
0.981708 |
0.982557 |
5.00E-05 |
0.985219 |
0.981823 |
0.982092 |
0.982902 |
4.00E-05 |
0.984175 |
0.9817 |
0.982372 |
0.983454 |
3.00E-05 |
0.986453 |
0.982784 |
0.98149 |
0.983354 |
2.00E-05 |
0.987093 |
0.986064 |
0.983061 |
0.983821 |
1.00E-05 |
0.991136 |
0.990072 |
0.983126 |
0.983631 |
9.00E-06 |
0.990132 |
0.990562 |
0.983663 |
0.983736 |
8.00E-06 |
0.990996 |
0.990649 |
0.98411 |
0.984148 |
7.00E-06 |
0.991066 |
0.991853 |
0.98446 |
0.983961 |
6.00E-06 |
0.991611 |
0.991803 |
0.985444 |
0.9844 |
5.00E-06 |
0.992417 |
0.99206 |
0.985239 |
0.984138 |
4.00E-06 |
0.989972 |
0.9923 |
0.985756 |
0.984478 |
3.00E-06 |
0.992362 |
0.992355 |
0.985749 |
0.9849 |
2.00E-06 |
0.992375 |
0.992392 |
0.988336 |
0.98498 |
1.00E-06 |
0.992412 |
0.99231 |
0.98973 |
0.985886 |
这4组数据还是比较清晰的体现了在迭代停止标准相同的前提下网络的层数越多网络的平均准确率越大
9>7>5>3
再比较最大性能
9 |
7 |
5 |
3 |
|
δ |
最大值p-max |
最大值p-max |
最大值p-max |
最大值p-max |
0.5 |
0.76839 |
0.717197 |
0.880716 |
0.804672 |
0.4 |
0.946322 |
0.935388 |
0.928926 |
0.818588 |
0.3 |
0.949304 |
0.95825 |
0.951789 |
0.917992 |
0.2 |
0.960239 |
0.959245 |
0.959742 |
0.963718 |
0.1 |
0.964712 |
0.966203 |
0.965209 |
0.968688 |
0.01 |
0.979622 |
0.976143 |
0.977634 |
0.977634 |
0.001 |
0.989066 |
0.987575 |
0.985586 |
0.982604 |
9.00E-04 |
0.988569 |
0.988072 |
0.987078 |
0.983101 |
8.00E-04 |
0.989066 |
0.989563 |
0.99006 |
0.983101 |
7.00E-04 |
0.99006 |
0.988569 |
0.988569 |
0.985586 |
6.00E-04 |
0.990557 |
0.989066 |
0.989563 |
0.986581 |
5.00E-04 |
0.99006 |
0.988569 |
0.988072 |
0.986581 |
4.00E-04 |
0.991054 |
0.990557 |
0.988072 |
0.985089 |
3.00E-04 |
0.992545 |
0.991054 |
0.989066 |
0.986083 |
2.00E-04 |
0.99503 |
0.992048 |
0.99006 |
0.985089 |
1.00E-04 |
0.99503 |
0.994036 |
0.992545 |
0.987078 |
9.00E-05 |
0.994533 |
0.994533 |
0.991054 |
0.987078 |
8.00E-05 |
0.994533 |
0.993539 |
0.992048 |
0.987575 |
7.00E-05 |
0.995527 |
0.994036 |
0.992545 |
0.988072 |
6.00E-05 |
0.994533 |
0.994036 |
0.992048 |
0.987575 |
5.00E-05 |
0.995527 |
0.994036 |
0.993042 |
0.988072 |
4.00E-05 |
0.995527 |
0.994036 |
0.992545 |
0.988072 |
3.00E-05 |
0.995527 |
0.994533 |
0.993042 |
0.988569 |
2.00E-05 |
0.99503 |
0.995527 |
0.994036 |
0.987575 |
1.00E-05 |
0.994533 |
0.99503 |
0.994533 |
0.988569 |
9.00E-06 |
0.99503 |
0.99503 |
0.994533 |
0.989066 |
8.00E-06 |
0.99503 |
0.994533 |
0.994036 |
0.988569 |
7.00E-06 |
0.995527 |
0.995527 |
0.994036 |
0.988072 |
6.00E-06 |
0.994533 |
0.99503 |
0.994533 |
0.989066 |
5.00E-06 |
0.99503 |
0.99503 |
0.994036 |
0.988072 |
4.00E-06 |
0.99503 |
0.994533 |
0.994533 |
0.989563 |
3.00E-06 |
0.994533 |
0.995527 |
0.994036 |
0.989066 |
2.00E-06 |
0.995527 |
0.995527 |
0.994533 |
0.988569 |
1.00E-06 |
0.995527 |
0.994533 |
0.995527 |
0.99006 |
这组数据再次清晰的体现了随着网络层数的增加网络的性能随之增加,特别是网络由3层加至5层以后性能提升非常明显。但7层和9层的网络当δ比较小的时候曲线已经高度重合,表明7层或9层的网络在δ比较小的区间的性能差异已经很难体现。
9>7>5>3
比较迭代次数
9 |
7 |
5 |
3 |
|
δ |
迭代次数n |
迭代次数n |
迭代次数n |
迭代次数n |
0.5 |
8.291457 |
7.628141 |
5.728643 |
4.824121 |
0.4 |
334.3869 |
136.5377 |
47.92965 |
10.60302 |
0.3 |
445.608 |
223.8141 |
105.6432 |
32.92462 |
0.2 |
500.1307 |
275.8442 |
155.2563 |
68.78894 |
0.1 |
612.9146 |
369.5226 |
248.9497 |
155.2965 |
0.01 |
1085.503 |
754.2462 |
596.3015 |
492.8593 |
0.001 |
9882.784 |
8027.377 |
2793.01 |
1295.281 |
9.00E-04 |
10881.03 |
8691.206 |
2924.266 |
1368.503 |
8.00E-04 |
11977.38 |
9745.246 |
3438.271 |
1426.709 |
7.00E-04 |
14072.29 |
10714.58 |
4023.141 |
1494.201 |
6.00E-04 |
16406.42 |
12371.1 |
4890.673 |
1667.829 |
5.00E-04 |
20385.63 |
15137.29 |
5992.065 |
1749.307 |
4.00E-04 |
24298.51 |
17785.58 |
7844.638 |
1875.171 |
3.00E-04 |
32696.08 |
22253.22 |
11235.22 |
2184.286 |
2.00E-04 |
44724.82 |
31035.44 |
15313.87 |
2582.925 |
1.00E-04 |
89078.6 |
56299.69 |
25407.06 |
3498.412 |
9.00E-05 |
95407.9 |
63010.57 |
27220.85 |
3645.025 |
8.00E-05 |
104874.4 |
68325.11 |
29562.66 |
3840.156 |
7.00E-05 |
116214.4 |
78167.16 |
32122.32 |
4077.126 |
6.00E-05 |
131232.3 |
89236.13 |
34942.84 |
4212.678 |
5.00E-05 |
149461.3 |
102580.9 |
39240.9 |
4589.568 |
4.00E-05 |
166924.9 |
117010.6 |
42965.2 |
5167.663 |
3.00E-05 |
204507.9 |
149188.4 |
52871.19 |
5821.111 |
2.00E-05 |
274629 |
212703 |
64717.94 |
6976.513 |
1.00E-05 |
439076.3 |
360851.1 |
90076 |
9615.879 |
9.00E-06 |
473836.4 |
386946.7 |
91610.54 |
9692.05 |
8.00E-06 |
514704.1 |
429464.9 |
99462.98 |
10012.85 |
7.00E-06 |
550991.1 |
455372.7 |
105727.8 |
10419.32 |
6.00E-06 |
616058.6 |
535610.7 |
110838.9 |
11089.11 |
5.00E-06 |
742561 |
608267.8 |
118164.4 |
12141.85 |
4.00E-06 |
892665.6 |
729212.1 |
138541.7 |
12888.37 |
3.00E-06 |
1155870 |
930735.7 |
155032.7 |
13944.59 |
2.00E-06 |
1.69E+06 |
1390712 |
189751.4 |
16152.7 |
1.00E-06 |
3.35E+06 |
2727215 |
318306.6 |
20551.51 |
随着网络层数的增加迭代次数大比例的增加,比较当δ=1e-6时的数据
9 |
7 |
5 |
3 |
|
δ |
迭代次数n |
迭代次数n |
迭代次数n |
迭代次数n |
1.00E-06 |
3.35E+06 |
2727215 |
318306.6 |
20551.51 |
1.228442 |
8.567887 |
15.48824 |
对比数据n9/n7=1.22,n7/n5=8.56,n5/n3=15.48
可以明显观察到随着层数的增加迭代次数是增加的,但随着层数的增加迭代次数增加的速度是减慢的。
9>7>5>3
最后比较收敛时间
9 |
7 |
5 |
3 |
|
δ |
耗时 min/199 |
耗时 min/199 |
耗时 min/199 |
耗时 min/199 |
0.5 |
0.409583 |
0.297767 |
0.191367 |
0.0586 |
0.4 |
0.547117 |
0.333867 |
0.192717 |
0.05605 |
0.3 |
0.592417 |
0.3599 |
0.203 |
0.057967 |
0.2 |
0.6182 |
0.375717 |
0.219817 |
0.0597 |
0.1 |
0.664517 |
0.404467 |
0.23055 |
0.063883 |
0.01 |
0.865933 |
0.52845 |
0.299283 |
0.084767 |
0.001 |
4.660217 |
2.794767 |
0.72315 |
0.134933 |
9.00E-04 |
5.107717 |
3.0043 |
0.748283 |
0.136217 |
8.00E-04 |
5.547917 |
3.32935 |
0.849567 |
0.140483 |
7.00E-04 |
6.4496 |
3.633417 |
0.961167 |
0.144367 |
6.00E-04 |
7.462333 |
4.14635 |
1.127583 |
0.154583 |
5.00E-04 |
9.166217 |
5.0117 |
1.344267 |
0.158483 |
4.00E-04 |
10.83363 |
5.847867 |
1.699267 |
0.166983 |
3.00E-04 |
14.45065 |
7.23295 |
2.354367 |
0.186433 |
2.00E-04 |
18.27553 |
9.96255 |
3.1458 |
0.209817 |
1.00E-04 |
38.97403 |
17.8287 |
5.095617 |
0.279333 |
9.00E-05 |
40.24977 |
19.8958 |
5.439967 |
0.27905 |
8.00E-05 |
44.66172 |
20.44067 |
5.888667 |
0.2912 |
7.00E-05 |
50.07025 |
25.11477 |
6.383417 |
0.307417 |
6.00E-05 |
56.32692 |
28.81148 |
6.927333 |
0.311583 |
5.00E-05 |
64.26778 |
33.43728 |
7.758767 |
0.3372 |
4.00E-05 |
71.88765 |
35.4373 |
8.501183 |
0.37535 |
3.00E-05 |
87.45185 |
46.21853 |
9.577633 |
0.415583 |
2.00E-05 |
117.4416 |
66.7971 |
13.08162 |
0.481717 |
1.00E-05 |
188.3502 |
111.3974 |
18.13227 |
0.642817 |
9.00E-06 |
205.6579 |
119.7637 |
18.41502 |
0.646617 |
8.00E-06 |
221.6506 |
133.2785 |
19.11077 |
0.6669 |
7.00E-06 |
236.5716 |
139.6939 |
21.95412 |
0.688267 |
6.00E-06 |
267.6226 |
168.1909 |
20.1341 |
0.7378 |
5.00E-06 |
318.1124 |
188.6062 |
24.30287 |
0.796267 |
4.00E-06 |
381.6106 |
227.9598 |
27.94432 |
0.84375 |
3.00E-06 |
492.3802 |
290.9228 |
29.18535 |
0.9145 |
2.00E-06 |
724.5874 |
431.3575 |
37.93767 |
1.047 |
1.00E-06 |
1420.324 |
839.3905 |
62.56628 |
1.3132 |
这个规律也是无比明显的,收敛时间随着层数的增加而增加。
9 |
7 |
5 |
3 |
|
δ |
耗时 min/199 |
耗时 min/199 |
耗时 min/199 |
耗时 min/199 |
1.00E-06 |
1420.324 |
839.3905 |
62.56628 |
1.3132 |
1.692089 |
13.41602 |
47.64414 |
当δ=1e-6时t9/t7=1.69,t7/t5=13.41,t5/t3=47.64
9>7>5>3
总结这4个表格
平均准确率,最大准确率,迭代次数,收敛时间的顺序都是
9>7>5>3
网络层数越多网络分类能力越强,需要的迭代次数越多,时间越长。
但是考虑到在δ比较小的区间7层和9层网络的性能差异已经不大,所以对于这道题一个比较经济的参数设置方案是用7层网络,收敛标准δ=5e-6,或者设定迭代次数n=608267.预估收敛时间为56秒,68%的概率准确率在99.011%-99.401%之间。有不小于0.5025%的概率拿到99.503%。