一、论文简介:
基于密集点的一个姿态估计模型
作者:
本文主要是Facebook AI 和INRIA(法国国家信息与自动化研究所)联合出品
效果:
关键点对应 2d imageà3d surface coordinate(uv坐标)
将2d图像的关键点映射成一个uv坐标,然后把这个uv坐标贴回到2d 的image上。
难点:
存在背景,遮挡,姿态,尺度的变化。应对这个问题之前的工作需要深度传感器
创新点:
1 :标注了一个新的数据集,基于coco数据集,增加了uv标注,该数据集开源。
2:设计了一个输入一张rgb的图片输出uv坐标的网络框架。在mask rcnn的结构基础上修改的。
3 :设计了一个“teacher”网络(用来生成数据集)。
数据集:
数据图片来源COCO2014数据集
train:26437 images
valminusminival: 5984 images
All:32421 images 48k humans
minival: 1508 images 2.3k humans
共标注了50K个人,以及超过500万的人工标注点。
teacher network:
人工标注阶段:人体共分成24块区域。每个局域通过聚类产生一些等距的关键点,关键点的数量根据区域的大小来定,最多有14个点。将这些点人工标注到3d表面(uv坐标)每个人大概有100-150个点
Teacher:训练了一个全卷积的teacher network输入人工标注的稀疏的特征点(uv坐标)输出更加密集的特征点。训练的时候是对图片根据bounding box裁剪出来人的区域,使用这个来训练网络,减少背景影响。
网络结构:
backbone network:Resnet50,Resnet101,Resnext
RPN: fpnROI-Align pooling
Head:必选的Fast-rcnn body_uv
可选的:mask、keypoint
网络结构方面的主要贡献就是body_uv branch和级联结构
分别来讲:
(1)、body_uv branch采用了和mask rcnn中的keypoint branch分支一样的网络结构(八层的同等分辨率卷积结构)后面接两个损失函数:
先对每个像素使用分类器进行分类,分类到所属于的区域(24+1(bachground)),损失函数为交叉熵损失。25路
然后使用回归器回归精确的位置,损失函数为L1损失,24路
(2)、网络的修改的小技巧:将多任务的输出进行级联,通过任务的协同作用,以及互补的不同监督信息带来的一些优点,
具体来讲就是将keypoint/mask 的输出和body_uv的输出进行融合,然后在分别输出计算损失。
效果:
一个GTX 1080 GPU
20-26 帧/s 240*320的图片
4-5帧/s 800*1100的图片
细节:
与DenseReg很像,但是DenseReg使用的是全卷积,并且DenseReg仅仅是对脸部进行预测,并且变化的幅度较小。(第一作者是同一个人)
二、实验部分:
比较了基于全卷积的神经网络(deeplab)和基于区域的网络(Mask-RCNN),发现基于区域的网络更优
Posetract challenge:
相同的任务,又重新在posetrack数据集上标注了一份数据集
Train:1680 images
Val:782 images
Test: 2698 images
COCO_Densepose
Train:
Images annotations categories
小训练集的图片数:26437
图片的的字典:[u'license', u'file_name', u'coco_url', u'height', u'width', u'date_captured', u'flickr_url', u'id']
图片字典举例:{u'license': 2, u'file_name': u'COCO_train2014_000000262145.jpg', u'coco_url': u'http://images.cocodataset.org/train2014/COCO_train2014_000000262145.jpg', u'height': 427, u'width': 640, u'date_captured': u'2013-11-20 02:07:55', u'flickr_url': u'http://farm8.staticflickr.com/7187/6967031859_5f08387bde_z.jpg', u'id': 262145}
小训练集的标注文件数:100403
标注文件的字典:[u'segmentation', u'num_keypoints', u'dp_masks', u'area', u'iscrowd', u'dp_I', u'keypoints', u'id', u'dp_U', u'image_id', u'dp_V', u'bbox', u'category_id', u'dp_y', u'dp_x']
关键字:segmentation长度:1
关键字:num_keypoints值:15
关键字:dp_masks长度:14
关键字:area值:21258
关键字:iscrowd值:0
关键字:dp_I长度:115
关键字:keypoints长度:51
关键字:id值:1218400
关键字:dp_U长度:115
关键字:image_id值:262145
关键字:dp_V长度:115
关键字:bbox长度:4
关键字:category_id值:1
关键字:dp_y长度:115
关键字:dp_x长度:115
标注文件字典举例:{u'segmentation': [[453, 292.1, 457, 253.1, 439, 245.1, 438, 215.1, 439, 198.1, 420, 223.1, 414, 233.1, 401, 227.1, 400, 226.1, 398, 229.1, 391, 231.1, 387, 213.1, 399, 203.1, 404, 200.1, 413, 194.1, 418, 186.1, 408, 181.1, 415, 154.1, 418, 142.1, 419, 127.1, 422, 125.1, 419, 120.1, 412, 122.1, 407, 112.1, 402, 105.1, 389, 113.1, 390, 105.1, 395, 100.1, 395, 97.1, 398, 83.1, 407, 72.1, 417, 71.1, 424, 72.1, 428, 73.1, 436, 80.1, 441, 90.1, 446, 96.1, 456, 101.1, 472, 110.1, 480, 113.1, 493, 123.1, 499, 136.1, 504, 147.1, 509, 167.1, 515, 182.1, 531, 205.1, 532, 218.1, 525, 229.1, 514, 246.1, 499, 283.1, 499, 307.1, 499, 323.1, 499, 343.1, 505, 367.1, 505, 380.1, 505, 381.1, 486, 387.1, 482, 392.1, 479, 393.1, 469, 363.1, 453, 343.1, 451, 339.1, 454, 321.1, 453, 312.1, 460, 313.1, 458, 298.1, 452, 293.1]], u'num_keypoints': 15, u'dp_masks': [{u'counts': u'\\Qa03k7200M210N110N1N2100000O1000O0010O2O0O10001O0O1N200O2O001O0O1O?B0O:G0O;F017H5K011N001O010O2N001O010O002N001O1O011N001O010OO1000001O000O2O00000000001O00001O001N1000001O0O1000001N101O00001N101O002N1O002N002N002M102N1N103M001O002M2O003M001N102N2N002N001O002M3N002N002M101O002N2M102N001O002N2N002M102N001OUU9', u'size': [256, 256]}, {u'counts': u'lk5110l70TH0j7400O2O0NNZHNf72ZHNe77O01003M0O010O10O100000O10000O01000000O10000O10O1001O001N101M3N101M20[\\_1', u'size': [256, 256]}, {u'counts': u'i33m70L5O0O2O0O100O1OL`HIa77_HI_79_HIa77_HI`77`HK_75aHK_7:1N20OO200000000000O1000O10O0101N2N101O001N102M`Tf1', u'size': [256, 256]}, [], {u'counts': u'e_Y12m73N0O2O002N0O10000000000000000000000000000000M2L50_P`0', u'size': [256, 256]}, {u'counts': u'im[14l70N110L400M3N200M300M300M3N200M300L400N2M4O0M@c00ON30NN40ON0120ON30NO30NN3O20M120O010NO2010OO201N100M3O100N20P\\6', u'size': [256, 256]}, {u'counts': u'[SP16_14b4;_J9]5GcJ9]5X1M0O5L004K3N0000000000000000000O100000000O1000000O2O0000000O10001O000000000000000O1000O0N300N200L40ON3N200M30ON300M3N200M300L40OO2M300M30O0100N200O0N300O100N200N2O10ON300O100N10101L301N101M3M201M201N101E:0m\\5', u'size': [256, 256]}, {u'counts': u']oX17i70D4l70E;00E;F:00J600O1000O11O0O100O100O1O10000000000O1O1001O0000001O00000O101F901A>00BQWW1', u'size': [256, 256]}, [], {u'counts': u'Z[71o7001O1O0O2O001O0O3M100O100O0100O0100O010O100O01000M300L31000000O10000O100O10O11O0000000O100O1O100O1000000O1O100O100O1000000O1O100O10VmW1', u'size': [256, 256]}, {u'counts': u'_[>2n70O2O0O2O1O0O2O0O100O1O0100000O010O00010O1000O10O00100O010O0100000O12M103L104L00g\\W1', u'size': [256, 256]}, {u'counts': u'`P52n70L6aHJP77jHNV72jHNU7?00O100O100O102N0O2O002N0O2O000O10001O0O101O001O00000O2O00000000O100O100O1O100N200N200O10001N100O101O00001O001N101O1N102M101N102N002M2M201N102N000O2N10[gV1', u'size': [256, 256]}], u'area': 21258, u'iscrowd': 0, u'dp_I': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 9.0, 1.0, 1.0, 2.0, 8.0, 10.0, 10.0, 8.0, 10.0, 10.0, 8.0, 10.0, 8.0, 10.0, 8.0, 17.0, 17.0, 17.0, 17.0, 17.0, 17.0, 17.0, 17.0, 17.0, 17.0, 17.0, 21.0, 13.0, 13.0, 13.0, 13.0, 13.0, 13.0, 13.0, 13.0, 11.0, 13.0, 10.0, 12.0, 14.0, 12.0, 14.0, 12.0, 14.0, 12.0, 14.0, 12.0, 14.0, 14.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0], u'keypoints': [407, 115, 1, 407, 105, 2, 0, 0, 0, 425, 95, 2, 0, 0, 0, 435, 124, 2, 457, 105, 2, 428, 187, 2, 447, 182, 2, 404, 210, 2, 419, 213, 2, 488, 222, 2, 515, 213, 2, 471, 293, 2, 487, 297, 2, 462, 372, 1, 486, 374, 2], u'id': 1218400, u'dp_U': [0.1581381857395172, 0.34230467677116394, 0.18912158906459808, 0.48821353912353516, 0.3884721100330353, 0.5135913491249084, 0.6443536877632141, 0.38792282342910767, 0.7409800291061401, 0.6505376100540161, 0.520325779914856, 0.8216454982757568, 0.8167153596878052, 0.6329784393310547, 0.8746959567070007, 0.7455276250839233, 0.9248602390289307, 0.5307193994522095, 0.5957505702972412, 0.5772902369499207, 0.42637890577316284, 0.9215373992919922, 0.4085750877857208, 0.557478129863739, 0.23966775834560394, 0.36956870555877686, 0.2535756826400757, 0.13388468325138092, 0.33851897716522217, 0.4326576590538025, 0.49585190415382385, 0.5520332455635071, 0.6304510235786438, 0.7085527181625366, 0.7812169194221497, 0.09183210134506226, 0.9279903769493103, 0.9287686944007874, 0.7573585510253906, 0.11356967687606812, 0.92024827003479, 0.8299093842506409, 0.28700605034828186, 0.75279301404953, 0.6569482088088989, 0.4226320683956146, 0.5276551842689514, 0.5945624709129333, 0.39777621626853943, 0.8695116639137268, 0.06964663416147232, 0.18622958660125732, 0.2207595258951187, 0.35709473490715027, 0.36292386054992676, 0.5079752802848816, 0.5260767936706543, 0.6477560997009277, 0.6901158690452576, 0.8074332475662231, 0.8204708099365234, 0.0556868351995945, 0.05974288657307625, 0.19443081319332123, 0.3120681643486023, 0.43140751123428345, 0.5478249788284302, 0.6082472801208496, 0.6847281455993652, 0.7598679661750793, 0.1376686841249466, 0.9295018315315247, 0.13133512437343597, 0.9445589184761047, 0.16949176788330078, 0.7630615234375, 0.3395075500011444, 0.6026873588562012, 0.5121622085571289, 0.4318861663341522, 0.6470393538475037, 0.2936002016067505, 0.7786632776260376, 0.9409124255180359, 0.21560901403427124, 0.16927339136600494, 0.39818519353866577, 0.3544045686721802, 0.5688581466674805, 0.4773864150047302, 0.641610324382782, 0.8311001062393188, 0.8021774291992188, 0.625206470489502, 0.6972959041595459, 0.5318204164505005, 0.4284036159515381, 0.27827775478363037, 0.7861599922180176, 0.7340675592422485, 0.6325574517250061, 0.712203323841095, 0.6250669956207275, 0.6883633136749268, 0.7496581077575684, 0.6079471707344055, 0.25784364342689514, 0.6282087564468384, 0.7382251620292664, 0.6516999006271362, 0.6727080941200256, 0.5526376366615295, 0.7019383311271667, 0.5480226874351501, 0.6156435608863831], u'image_id': 262145, u'dp_V': [0.435714453458786, 0.4540521800518036, 0.18669910728931427, 0.4730636179447174, 0.1817651242017746, 0.2397734522819519, 0.5154634714126587, 0.0778612270951271, 0.47144177556037903, 0.24830983579158783, 0.10124488919973373, 0.42400211095809937, 0.17712591588497162, 0.16078531742095947, 0.7430467009544373, 0.8068962097167969, 0.5839595198631287, 0.8275789022445679, 0.6089735627174377, 0.4219698905944824, 0.7108063101768494, 0.27599748969078064, 0.4922265112400055, 0.17351959645748138, 0.4757252335548401, 0.31542858481407166, 0.34038570523262024, 0.2190510332584381, 0.284136563539505, 0.18819834291934967, 0.24188412725925446, 0.3010953962802887, 0.2897706627845764, 0.3234020173549652, 0.3240533769130707, 0.7283157706260681, 0.32366207242012024, 0.11646141856908798, 0.1395215094089508, 0.698147177696228, 0.09631854295730591, 0.3345729112625122, 0.6794231534004211, 0.10249005258083344, 0.33502835035324097, 0.6944578886032104, 0.19180409610271454, 0.7063501477241516, 0.17014046013355255, 0.660202145576477, 0.4905553162097931, 0.6760637760162354, 0.4298979938030243, 0.6051788330078125, 0.33344659209251404, 0.6024321913719177, 0.304057776927948, 0.4545254111289978, 0.24095730483531952, 0.42929837107658386, 0.22992774844169617, 0.505514919757843, 0.10040315985679626, 0.09234366565942764, 0.10799026489257812, 0.17639049887657166, 0.19133812189102173, 0.35492274165153503, 0.1694529801607132, 0.369148850440979, 0.24146153032779694, 0.3719113767147064, 0.18182919919490814, 0.8576633930206299, 0.7511200308799744, 0.8856542110443115, 0.7903587222099304, 0.935828447341919, 0.7384026646614075, 0.8597866296768188, 0.7211704850196838, 0.7972701191902161, 0.7776200771331787, 0.7320044636726379, 0.641746997833252, 0.4209004044532776, 0.6608603000640869, 0.4275743067264557, 0.4784872233867645, 0.3132534921169281, 0.31148022413253784, 0.30090242624282837, 0.3639029562473297, 0.36897680163383484, 0.6103492975234985, 0.5535940527915955, 0.27160295844078064, 0.40101170539855957, 0.3286058008670807, 0.46157601475715637, 0.5251432061195374, 0.24991905689239502, 0.3960190713405609, 0.15772105753421783, 0.4226440191268921, 0.3048115074634552, 0.6505873799324036, 0.5413646101951599, 0.8349341750144958, 0.19350852072238922, 0.3841789960861206, 0.28090330958366394, 0.6460604071617126, 0.5135382413864136, 0.8778793215751648], u'bbox': [387, 71.1, 145, 322], u'category_id': 1, u'dp_y': [28.909210205078125, 36.55131530761719, 36.783695220947266, 48.302513122558594, 49.214847564697266, 60.31871795654297, 62.91701126098633, 67.61942291259766, 76.40342712402344, 76.62415313720703, 84.08012390136719, 88.83182525634766, 92.05416870117188, 97.28132629394531, 6.887820243835449, 7.758489608764648, 12.378670692443848, 13.159947395324707, 15.277416229248047, 19.964256286621094, 20.220537185668945, 20.284868240356445, 24.30759620666504, 26.47214126586914, 28.3717041015625, 29.94404411315918, 33.47541809082031, 37.36276626586914, 131.643798828125, 139.17393493652344, 146.8321533203125, 154.6294708251953, 162.16929626464844, 169.56320190429688, 176.72129821777344, 183.9130401611328, 101.44630432128906, 105.18871307373047, 110.8502197265625, 116.28874969482422, 119.52222442626953, 125.48912811279297, 131.7722930908203, 134.32630920410156, 141.18894958496094, 148.46072387695312, 154.85458374023438, 164.68812561035156, 169.036376953125, 180.37696838378906, 45.588768005371094, 49.0302734375, 53.33025360107422, 56.44441604614258, 61.177650451660156, 64.0594711303711, 69.0973129272461, 71.62738037109375, 77.1959457397461, 79.06790161132812, 85.4889144897461, 86.12149047851562, 191.22286987304688, 197.965087890625, 204.69712829589844, 211.28915405273438, 217.55638122558594, 222.9026641845703, 226.37686157226562, 231.09103393554688, 236.93728637695312, 239.00830078125, 190.00265502929688, 191.6421356201172, 197.13681030273438, 198.41720581054688, 204.24436950683594, 204.8842010498047, 211.41250610351562, 211.76991271972656, 218.84889221191406, 219.04078674316406, 225.9837188720703, 233.1320343017578, 91.54401397705078, 92.83362579345703, 94.70879364013672, 97.85394287109375, 100.72178649902344, 101.68196868896484, 105.78349304199219, 107.72846984863281, 100.16338348388672, 104.2774887084961, 105.09274291992188, 108.75057220458984, 109.46098327636719, 113.06881713867188, 243.55999755859375, 243.78909301757812, 245.17449951171875, 248.11851501464844, 248.7158203125, 115.09693145751953, 117.0737075805664, 120.05931854248047, 120.77220916748047, 123.07715606689453, 125.72364807128906, 109.85590362548828, 112.39854431152344, 114.20746612548828, 116.95733642578125, 117.73690032958984, 122.15008544921875], u'dp_x': [111.5760498046875, 140.59548950195312, 88.38611602783203, 158.2552947998047, 115.38705444335938, 140.92547607421875, 172.6801300048828, 115.71037292480469, 183.8076171875, 149.18008422851562, 120.6231689453125, 196.7709503173828, 164.02395629882812, 134.32180786132812, 64.5458755493164, 46.724369049072266, 77.15164184570312, 32.21230697631836, 52.95413589477539, 67.25080871582031, 33.050376892089844, 87.5832290649414, 49.61521911621094, 77.81704711914062, 31.799861907958984, 62.45124435424805, 42.439605712890625, 54.14159393310547, 219.4923553466797, 210.85491943359375, 203.37953186035156, 198.5994110107422, 194.32286071777344, 189.5088653564453, 187.13658142089844, 185.8370361328125, 214.15895080566406, 181.10533142089844, 146.47915649414062, 211.83815002441406, 177.25453186035156, 144.3924102783203, 199.31747436523438, 170.66110229492188, 144.83860778808594, 180.55224609375, 145.70240783691406, 172.3448486328125, 141.46939086914062, 157.65899658203125, 87.35091400146484, 72.78260040283203, 88.93101501464844, 71.13123321533203, 87.25609588623047, 69.8150405883789, 86.60945892333984, 69.17989349365234, 85.21100616455078, 67.95042419433594, 83.97472381591797, 65.75786590576172, 183.85504150390625, 182.17401123046875, 180.99432373046875, 178.43093872070312, 178.2622528076172, 170.95863342285156, 180.86807250976562, 171.43788146972656, 180.3823699951172, 168.86575317382812, 144.04090881347656, 163.4275665283203, 143.07244873046875, 162.6227569580078, 139.92333984375, 158.15179443359375, 136.32968139648438, 153.267333984375, 135.05577087402344, 150.7586212158203, 145.03981018066406, 148.38365173339844, 71.36695098876953, 86.16627502441406, 61.22429275512695, 75.84166717529297, 51.103759765625, 64.59684753417969, 51.51070022583008, 38.840843200683594, 87.83291625976562, 78.37515258789062, 90.03551483154297, 79.78938293457031, 69.83355712890625, 63.51275634765625, 185.44496154785156, 176.7206268310547, 169.04324340820312, 183.05271911621094, 173.74090576171875, 53.849117279052734, 41.75295639038086, 56.012142181396484, 32.203887939453125, 45.75273132324219, 34.0209846496582, 27.40822410583496, 16.41045379638672, 30.757139205932617, 8.695992469787598, 20.475650787353516, 8.169807434082031]}
模型的输出:
Body_uv:
子列表总数为:12158
body_uv的结果文件的关键字:[u'image_id', u'category_id', u'uv', u'score', u'bbox']
uv数据:(3, 224, 44)
body_uv的结果文件举例:{u'image_id': 885, u'category_id': 1, u'uv': array([[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]], dtype=uint8), u'score': 0.9986090064048767, u'bbox': [594.8278198242188, 26.12091064453125, 44, 224]}
该图片为:
图片的大小:(427, 640, 3)
visualize the I, U and V images.:
visualize the isocontours of the UV fields:
u'uv'输出:
(3, 224, 44)
u的总坐标数:9856
u的有效坐标数:4281(数组其余元素都是0)
结论:
uv数据(三维数组)的大小是bounding box一样的。
输出的uvi数组((3, 224, 44)三维分别是u、v、I,和bbox一样大小的宽和高是由输出的56*56的feature map 通过cv2.resize()得到的,也就是线性插值得到的。)
每个index的那部分数据,是由相应通道的feature中的那一小块数据得到的
uv坐标数据分析
(1)(训练数据)提供了dp_x, dp_y, dp_I, dp_U , dp_V
dp_x, dp_y: 是人工标注者收集的点在图像中的空间坐标
dp_I : 代表每个点所属的24块区域中的一块
dp_U , dp_V: 是uv空间空间中的坐标,每一块模型的表面都有一个这样的二维参数 数量在100左右
举例:
第一个图根据dp_I划分的每个点所属的区域
第二个,第三个图是使用x,y定位点的位置,然后分别使用u和v的值渲染出来的颜色。
(2)(模型输出结果)
模型输出结果是一个包含i,u,v值的mask, mask大小和人物的bbox一样大,这里面的u和v值就是密集的值了,数量是万级别的。也就是论文里的密集对应。
输出结果渲染结果展示:
(3)模型的损失计算
分类的softmax损失,和关键点的L1损失
(4)模型的评测标准
Body_uv评测使用OGPS作为度量,使用AP作为标准
OGPS:(1)计算预测的关键点与标签数据的关键点中距离最近的一对点。
(2)计算这对点的距离。
训练数据处理流程:
将输出的(56*56)的feature map 根据x y index 信息双线性差值池化成196*25形状的数组,然后与标签的u、v的196*25的数组做损失,标签本身的body_u,body_v的大小是196*1,然后复制25遍变成196*25
增加了nonlocal板块
增加了测试集COCO_test2015
body_uv_rcnn_heads.py
增加了一系列的头
解耦:add_roi_body_uv_head_v1convX_Decoupling
仿resnet结构:add_roi_body_uv_head_Modification_resnet
增加了一个卷积加全链接的头(卷积共用,全链接只有分类用)
头:add_roi_Xconv1fc_gn_head_test
配合用的输出:add_fast_rcnn_outputs_test
若更换FAST的head需要在287行替换
fast_rcnn_heads.add_fast_rcnn_outputs_test(model, blob_frcn, dim_frcn)
如果使用解耦需要修改355行的_add_roi_body_uv_head
函数内部分内容
为增加的nonlocal部分
为添加了nonlocal的ResNet基础网络
(一)、修改body_uv_head,增深了网络层数,24层的Residual structure,并且使用了GN,8%的提升
(二)、在ResNet50的第三个stage的第1,3residual_block后面和第四个stage的第1,3,5个residual_block后面添加nonlocal结构,可以带来1.3%-1.7%的提升
(三)、将fast_head由原来的两层全连接结构替换为,共用四层的卷积,然后分类分支再增加两层的全连接的结构,有0.5的提升
(四)、将原来的body_uv_head八层的卷积解耦,ann和index一个分支,u,v一个分支,每个分支八层的卷积,带来1.4%的提升
(五)、同步缩小训练时的图像尺寸,同步缩小64像素,单尺度测试,带来0.8%的提升
(六)、测试增强,SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)有2%的提升
(七)、Multi-task,keypoint分支和body_uv分支联合训练,大约带来2%的提升