制作一个二分类网络分类minst 0和3
(0,3)-n*m*2-(1,0)(0,1)
激活函数分别使用tanh和sigmoid,用交叉对比固定收敛标准多次测量取平均值的办法比较两个激活函数到底有什么差异。
数据1:tanh 每个收敛标准收敛199次,共25*199次
tanh |
||||||||
*03 |
||||||||
f2[0] |
f2[1] |
迭代次数n |
平均准确率p-ave |
δ |
耗时ms/次 |
耗时ms/199次 |
耗时 min/199 |
最大准确率p-max |
0.500353 |
0.415556 |
29.09045 |
0.715959 |
0.5 |
14.76884 |
2947 |
0.049117 |
0.962312 |
0.651631 |
0.357993 |
44.50754 |
0.754413 |
0.4 |
13.86432 |
2761 |
0.046017 |
0.956281 |
0.70054 |
0.253659 |
63.36683 |
0.890238 |
0.3 |
14.0804 |
2810 |
0.046833 |
0.966834 |
0.622577 |
0.2358 |
82.18593 |
0.94784 |
0.2 |
14.61307 |
2919 |
0.04865 |
0.973869 |
0.632443 |
0.245715 |
117.2714 |
0.959779 |
0.1 |
14.76884 |
2946 |
0.0491 |
0.975879 |
0.468252 |
0.522757 |
1335.035 |
0.985069 |
0.01 |
28.49749 |
5679 |
0.09465 |
0.98995 |
0.537243 |
0.461893 |
9637.683 |
0.987717 |
0.001 |
124.8844 |
24853 |
0.414217 |
0.991457 |
0.512175 |
0.487092 |
10134.37 |
0.987957 |
9.00E-04 |
129.3568 |
25751 |
0.429183 |
0.990955 |
0.517224 |
0.482056 |
11665.16 |
0.987493 |
8.00E-04 |
147.7688 |
29406 |
0.4901 |
0.991457 |
0.637769 |
0.361572 |
14006.98 |
0.987526 |
7.00E-04 |
174.0452 |
34643 |
0.577383 |
0.991457 |
0.703132 |
0.29634 |
16579.93 |
0.986488 |
6.00E-04 |
203.3015 |
40464 |
0.6744 |
0.991457 |
0.738377 |
0.261153 |
22576.23 |
0.987076 |
5.00E-04 |
271.6533 |
54066 |
0.9011 |
0.99196 |
0.78867 |
0.210962 |
33005.37 |
0.986882 |
4.00E-04 |
391.1156 |
77833 |
1.297217 |
0.99196 |
0.783714 |
0.216036 |
47255.99 |
0.987662 |
3.00E-04 |
554.0352 |
110268 |
1.8378 |
0.993467 |
0.688321 |
0.311501 |
66911.05 |
0.989281 |
2.00E-04 |
778.6281 |
154971 |
2.58285 |
0.993467 |
0.592911 |
0.407003 |
116691.6 |
0.991109 |
1.00E-04 |
994.2563 |
197857 |
3.297617 |
0.995477 |
0.557746 |
0.442177 |
115008.1 |
0.99124 |
9.00E-05 |
1326.709 |
264022 |
4.400367 |
0.994472 |
0.623077 |
0.376864 |
119430.7 |
0.991139 |
8.00E-05 |
1373.698 |
273374 |
4.556233 |
0.994975 |
0.55273 |
0.44721 |
133684.6 |
0.991619 |
7.00E-05 |
1522.899 |
303059 |
5.050983 |
0.994975 |
0.648208 |
0.351737 |
142422.2 |
0.991806 |
6.00E-05 |
1594.905 |
317394 |
5.2899 |
0.994975 |
0.63314 |
0.366819 |
161331.8 |
0.992121 |
5.00E-05 |
1797.372 |
357682 |
5.961367 |
0.994975 |
0.60802 |
0.391948 |
184704.5 |
0.992344 |
4.00E-05 |
2078.271 |
413591 |
6.893183 |
0.995477 |
0.638177 |
0.3618 |
205468.7 |
0.992558 |
3.00E-05 |
2290.075 |
455731 |
7.595517 |
0.994975 |
0.57788 |
0.422105 |
233099.6 |
0.993018 |
2.00E-05 |
2604.116 |
518227 |
8.637117 |
0.995477 |
0.567836 |
0.432158 |
290422.3 |
0.993541 |
1.00E-05 |
3223.593 |
641500 |
10.69167 |
0.996482 |
数据2:sigmoid共测量了43*199次
sig |
||||||||
*03 |
||||||||
f2[0] |
f2[1] |
迭代次数n |
平均准确率p-ave |
δ |
耗时ms/次 |
耗时ms/199次 |
耗时 min/199 |
最大准确率p-max |
0.502547669 |
0.498213 |
19.13065 |
0.518881 |
0.5 |
8.175879 |
1627 |
0.027117 |
0.866834 |
0.552177615 |
0.447849 |
300.397 |
0.954814 |
0.4 |
9.432161 |
1877 |
0.031283 |
0.973869 |
0.668711431 |
0.331904 |
374.1256 |
0.968738 |
0.3 |
9.899497 |
1986 |
0.0331 |
0.980402 |
0.478841824 |
0.520571 |
449.7286 |
0.977006 |
0.2 |
10.68844 |
2143 |
0.035717 |
0.984925 |
0.124091467 |
0.875997 |
552.6935 |
0.982579 |
0.1 |
11.45226 |
2279 |
0.037983 |
0.984925 |
0.32489657 |
0.675131 |
1213.266 |
0.985134 |
0.01 |
16.34673 |
3269 |
0.054483 |
0.986935 |
0.241692045 |
0.758309 |
3918.683 |
0.986884 |
0.001 |
37.64824 |
7508 |
0.125133 |
0.990452 |
0.226593223 |
0.773408 |
4302.819 |
0.987265 |
9.00E-04 |
41.21106 |
8201 |
0.136683 |
0.990452 |
0.19643095 |
0.803571 |
4589.744 |
0.98746 |
8.00E-04 |
42.98995 |
8555 |
0.142583 |
0.990452 |
0.151206651 |
0.848793 |
5202.563 |
0.988023 |
7.00E-04 |
48.0804 |
9586 |
0.159767 |
0.990452 |
0.136085504 |
0.863914 |
5801.864 |
0.988137 |
6.00E-04 |
53.1005 |
10582 |
0.176367 |
0.990452 |
0.15107437 |
0.848925 |
6836.291 |
0.988031 |
5.00E-04 |
61.19598 |
12193 |
0.203217 |
0.990452 |
0.186158149 |
0.813842 |
7983.03 |
0.987397 |
4.00E-04 |
70.04523 |
13939 |
0.232317 |
0.990452 |
0.306637127 |
0.693364 |
10110.83 |
0.986793 |
3.00E-04 |
86.18593 |
17167 |
0.286117 |
0.989447 |
0.306606847 |
0.693393 |
15261.92 |
0.986586 |
2.00E-04 |
130.3869 |
25956 |
0.4326 |
0.989447 |
0.613044571 |
0.386955 |
36494.64 |
0.987493 |
1.00E-04 |
292.9497 |
58297 |
0.971617 |
0.991457 |
0.698459049 |
0.301541 |
38622.99 |
0.987106 |
9.00E-05 |
308.3266 |
61357 |
1.022617 |
0.991457 |
0.693438295 |
0.306562 |
41566.63 |
0.987473 |
8.00E-05 |
332.3216 |
66132 |
1.1022 |
0.991457 |
0.723588776 |
0.276411 |
44855.03 |
0.987897 |
7.00E-05 |
357.6131 |
71197 |
1.186617 |
0.99196 |
0.683396475 |
0.316604 |
48059.4 |
0.988366 |
6.00E-05 |
383.4271 |
76302 |
1.2717 |
0.992462 |
0.688424754 |
0.311575 |
52975.13 |
0.988556 |
5.00E-05 |
420.8693 |
83753 |
1.395883 |
0.992462 |
0.688428142 |
0.311572 |
56542.35 |
0.98944 |
4.00E-05 |
448.0352 |
89174 |
1.486233 |
0.992965 |
0.718580776 |
0.281419 |
60788.42 |
0.989667 |
3.00E-05 |
481.995 |
95917 |
1.598617 |
0.993467 |
0.723609701 |
0.27639 |
67787.28 |
0.990753 |
2.00E-05 |
536.4774 |
106759 |
1.779317 |
0.99397 |
0.618088243 |
0.381912 |
79939.21 |
0.991601 |
1.00E-05 |
631.8844 |
125760 |
2.096 |
0.994975 |
0.623113546 |
0.376886 |
82122.32 |
0.992013 |
9.00E-06 |
649.0854 |
129184 |
2.153067 |
0.994472 |
0.673364278 |
0.326636 |
83016.99 |
0.991781 |
8.00E-06 |
656.4523 |
130634 |
2.177233 |
0.994975 |
0.60301373 |
0.396986 |
86061.2 |
0.992119 |
7.00E-06 |
681.9698 |
135728 |
2.262133 |
0.994975 |
0.55778833 |
0.442212 |
87802.62 |
0.992025 |
6.00E-06 |
692.598 |
137842 |
2.297367 |
0.994975 |
0.597989092 |
0.402011 |
91195.31 |
0.992056 |
5.00E-06 |
731.9347 |
145660 |
2.427667 |
0.994975 |
0.597989245 |
0.402011 |
94757.74 |
0.992525 |
4.00E-06 |
742.6985 |
147803 |
2.463383 |
0.994975 |
0.618089828 |
0.38191 |
99007.35 |
0.992397 |
3.00E-06 |
774.4724 |
154125 |
2.56875 |
0.994975 |
0.603014704 |
0.396985 |
106014.5 |
0.992432 |
2.00E-06 |
831.4573 |
165469 |
2.757817 |
0.995477 |
0.572864206 |
0.427136 |
117696.3 |
0.993195 |
1.00E-06 |
511.6231 |
101821 |
1.697017 |
0.996482 |
0.643215857 |
0.356784 |
119205.3 |
0.993467 |
9.00E-07 |
933.6683 |
185810 |
3.096833 |
0.99598 |
0.562813988 |
0.437186 |
122532.5 |
0.993563 |
8.00E-07 |
961.1608 |
191280 |
3.188 |
0.99598 |
0.542713524 |
0.457286 |
125011.9 |
0.993773 |
7.00E-07 |
980.6734 |
195156 |
3.2526 |
0.99598 |
0.542713525 |
0.457286 |
127423.5 |
0.993609 |
6.00E-07 |
999.4171 |
198894 |
3.3149 |
0.996482 |
0.572864262 |
0.427136 |
130846.4 |
0.993735 |
5.00E-07 |
1026.085 |
204197 |
3.403283 |
0.996985 |
0.562814031 |
0.437186 |
133950.5 |
0.993881 |
4.00E-07 |
1049.749 |
208907 |
3.481783 |
0.99598 |
0.557788917 |
0.442211 |
140273.8 |
0.993927 |
3.00E-07 |
1099.955 |
218895 |
3.64825 |
0.996985 |
0.507537689 |
0.492462 |
148654.8 |
0.994197 |
2.00E-07 |
1164.583 |
231759 |
3.86265 |
0.996482 |
0.48743719 |
0.512563 |
163688.8 |
0.99452 |
1.00E-07 |
1281.653 |
255059 |
4.250983 |
0.996482 |
在相同收敛标准下比较迭代次数
δ |
tanh |
sig |
tanh/sig |
0.5 |
29.09045 |
19.13065 |
1.52062 |
0.4 |
44.50754 |
300.397 |
0.148162 |
0.3 |
63.36683 |
374.1256 |
0.169373 |
0.2 |
82.18593 |
449.7286 |
0.182746 |
0.1 |
117.2714 |
552.6935 |
0.212182 |
0.01 |
1335.035 |
1213.266 |
1.100364 |
0.001 |
9637.683 |
3918.683 |
2.459419 |
9.00E-04 |
10134.37 |
4302.819 |
2.355287 |
8.00E-04 |
11665.16 |
4589.744 |
2.541571 |
7.00E-04 |
14006.98 |
5202.563 |
2.692323 |
6.00E-04 |
16579.93 |
5801.864 |
2.85769 |
5.00E-04 |
22576.23 |
6836.291 |
3.302408 |
4.00E-04 |
33005.37 |
7983.03 |
4.134442 |
3.00E-04 |
47255.99 |
10110.83 |
4.6738 |
2.00E-04 |
66911.05 |
15261.92 |
4.384182 |
1.00E-04 |
116691.6 |
36494.64 |
3.1975 |
9.00E-05 |
115008.1 |
38622.99 |
2.97771 |
8.00E-05 |
119430.7 |
41566.63 |
2.873235 |
7.00E-05 |
133684.6 |
44855.03 |
2.980371 |
6.00E-05 |
142422.2 |
48059.4 |
2.963461 |
5.00E-05 |
161331.8 |
52975.13 |
3.045426 |
4.00E-05 |
184704.5 |
56542.35 |
3.266658 |
3.00E-05 |
205468.7 |
60788.42 |
3.380063 |
2.00E-05 |
233099.6 |
67787.28 |
3.438693 |
1.00E-05 |
290422.3 |
79939.21 |
3.63304 |
为达到相同的收敛标准tanh需要的迭代次数约为sigmoid的2.57倍,如果迭代次数越多表明两个分类对象越相似。这组数据表明0和3这个两个分类对象相对tanh的对称性比sigmoid要强。Sigmoid加速了0和3对称性的破缺。
比较平均分类准确率pave
平均准确率p-ave |
平均准确率p-ave |
||
δ |
tanh |
sig |
tanh/sig |
0.01 |
0.985069 |
0.985134 |
0.999933 |
0.001 |
0.987717 |
0.986884 |
1.000844 |
9.00E-04 |
0.987957 |
0.987265 |
1.000701 |
8.00E-04 |
0.987493 |
0.98746 |
1.000033 |
7.00E-04 |
0.987526 |
0.988023 |
0.999497 |
6.00E-04 |
0.986488 |
0.988137 |
0.998331 |
5.00E-04 |
0.987076 |
0.988031 |
0.999034 |
4.00E-04 |
0.986882 |
0.987397 |
0.999478 |
3.00E-04 |
0.987662 |
0.986793 |
1.00088 |
2.00E-04 |
0.989281 |
0.986586 |
1.002731 |
1.00E-04 |
0.991109 |
0.987493 |
1.003662 |
9.00E-05 |
0.99124 |
0.987106 |
1.004188 |
8.00E-05 |
0.991139 |
0.987473 |
1.003713 |
7.00E-05 |
0.991619 |
0.987897 |
1.003768 |
6.00E-05 |
0.991806 |
0.988366 |
1.00348 |
5.00E-05 |
0.992121 |
0.988556 |
1.003607 |
4.00E-05 |
0.992344 |
0.98944 |
1.002935 |
3.00E-05 |
0.992558 |
0.989667 |
1.002922 |
2.00E-05 |
0.993018 |
0.990753 |
1.002286 |
1.00E-05 |
0.993541 |
0.991601 |
1.001956 |
当δ<1e-4以后tanh的pave显著的大于sigmoid的pave。
比较等收敛标准下的最大分辨准确率pmax
最大准确率p-max |
最大准确率p-max |
|
tanh |
sig |
δ |
0.975879 |
0.984925 |
0.1 |
0.98995 |
0.986935 |
0.01 |
0.991457 |
0.990452 |
0.001 |
0.990955 |
0.990452 |
9.00E-04 |
0.991457 |
0.990452 |
8.00E-04 |
0.991457 |
0.990452 |
7.00E-04 |
0.991457 |
0.990452 |
6.00E-04 |
0.99196 |
0.990452 |
5.00E-04 |
0.99196 |
0.990452 |
4.00E-04 |
0.993467 |
0.989447 |
3.00E-04 |
0.993467 |
0.989447 |
2.00E-04 |
0.995477 |
0.991457 |
1.00E-04 |
0.994472 |
0.991457 |
9.00E-05 |
0.994975 |
0.991457 |
8.00E-05 |
0.994975 |
0.99196 |
7.00E-05 |
0.994975 |
0.992462 |
6.00E-05 |
0.994975 |
0.992462 |
5.00E-05 |
0.995477 |
0.992965 |
4.00E-05 |
0.994975 |
0.993467 |
3.00E-05 |
0.995477 |
0.99397 |
2.00E-05 |
0.996482 |
0.994975 |
1.00E-05 |
这个结果很明显,当δ<0.01以后tanh的pmax都大于sigmoid的pmax
因此综合上述三组数据可以得出,当δ相同的情况下,tanh的平均性能和最大性能都要显著的好于sigmoid,但是tanh为之付出的迭代次数也显著的大于sigmoid 。
因此从收敛效率上比较,哪个函数更好些?
迭代次数n |
平均准确率p-ave |
δ |
耗时ms/次 |
耗时ms/199次 |
耗时 min/199 |
最大准确率p-max |
|
tanh |
161331.8392 |
0.992121 |
5.00E-05 |
1797.372 |
357682 |
5.961367 |
0.994975 |
205468.6884 |
0.992558 |
3.00E-05 |
2290.075 |
455731 |
7.595517 |
0.994975 |
|
233099.6432 |
0.993018 |
2.00E-05 |
2604.116 |
518227 |
8.637117 |
0.995477 |
|
sigmoid |
86061.19598 |
0.992119 |
7.00E-06 |
681.9698 |
135728 |
2.262133 |
0.994975 |
94757.74372 |
0.992525 |
4.00E-06 |
742.6985 |
147803 |
2.463383 |
0.994975 |
|
117696.2714 |
0.993195 |
1.00E-06 |
511.6231 |
101821 |
1.697017 |
0.996482 |
|
tanh/sig |
1.874617676 |
1.000003 |
|
2.635559 |
2.635285 |
2.635285 |
1 |
2.168357755 |
1.000033 |
|
3.083452 |
3.083368 |
3.083368 |
1 |
|
1.980518504 |
0.999822 |
|
5.08991 |
5.089589 |
5.089589 |
0.998991 |
从表格中分别挑出了三组值,这对应的三组值的pave相当,用这三组数据比较tanh 和sigmoid达到相同性能的效率差异。
比如第一组值pave=0.9921 ,tanh和sigmoid分别用了161331次和86061次迭代,tanh是sigmoid的1.87倍,耗时tanh是sigmoid 的2.63倍。
这三组数据表明sigmoid达到相同的性能需要的迭代次数要比tanh要少,耗时也少,表明sigmoid 的收敛效率要高的多。
因此比较这两个函数的性能
在收敛标准相同的前提下,tanh的平均性能要好于sigmoid
在迭代次数相同的前提下,sigmoid的平均性能要好于tanh
在目标性能一致的前提下,sigmoid的收敛效率显著的高于tanh