[1] M. Andriluka, U. Iqbal, A. Milan, E. Insafutdinov, L. Pishchulin, J. Gall, and B. Schiele. Posetrack: A bench- mark for human pose estimation and tracking. In CVPR, pages 5167–5176, 2018. 2, 9
[2] M. Andriluka, L. Pishchulin, P. V. Gehler, and B. Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In CVPR, pages 3686–3693, 2014. 2, 6, 9
[3] V. Belagiannis and A. Zisserman. Recurrent human pose estimgation. In FG, pages 468–475, 2017. 3
[4] A. Bulat and G. Tzimiropoulos. Human pose estimation via convolutional part heatmap regression. In ECCV, volume 9911 of Lecture Notes in Computer Science, pages 717–732. Springer, 2016. 2, 6
[5] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos. A unified multi-scale deep convolutional neural network for fast object detection. In ECCV, pages 354–370, 2016. 3
[6] Z. Cao, T. Simon, S. Wei, and Y. Sheikh. Realtime multi- person 2d pose estimation using part affinity fields. In CVPR, pages 1302–1310, 2017. 1, 5
[7] J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik. Hu- man pose estimation with iterative error feedback. In CVPR, pages 4733–4742, 2016. 2
[8] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell., 40(4):834– 848, 2018. 3
[9] X. Chen and A. L. Yuille. Articulated pose estimation by a graphical model with image dependent pairwise relations. In NIPS, pages 1736–1744, 2014. 2
[10] Y. Chen, C. Shen, X. Wei, L. Liu, and J. Yang. Adversarial posenet: A structure-aware convolutional network for human pose estimation. In ICCV, pages 1221–1230, 2017. 6
[11] Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. Cascaded pyramid network for multi-person pose estimation. CoRR, abs/1711.07319, 2017. 2, 3, 5, 6
[12] C. Chou, J. Chien, and H. Chen. Self adversarial training for human pose estimation. CoRR, abs/1707.02439, 2017. 6
[13] X. Chu, W. Ouyang, H. Li, and X. Wang. Structured feature learning for pose estimation. In CVPR, pages 4715–4723, 2016. 2
[14] X. Chu, W. Yang, W. Ouyang, C. Ma, A. L. Yuille, and X. Wang. Multi-context attention for human pose estima- tion. In CVPR, pages 5669–5678, 2017. 2, 6
[15] A. Doering, U. Iqbal, and J. Gall. Joint flow: Temporal flow fields for multi person tracking, 2018. 7
[16] X. Fan, K. Zheng, Y. Lin, and S. Wang. Combining local appearance and holistic view: Dual-source deep neural net- works for human pose estimation. In CVPR, pages 1347– 1355, 2015. 2
[17] H. Fang, S. Xie, Y. Tai, and C. Lu. RMPE: regional multi- person pose estimation. In ICCV, pages 2353–2362, 2017. 1, 5
[18] D. Fourure, R. Emonet, É. Fromont, D. Muselet, A. Trémeau, and C. Wolf. Residual conv-deconv grid net- work for semantic segmentation. In British Machine Vision Conference 2017, BMVC 2017, London, UK, September 4-7, 2017, 2017. 3
[19] R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri, and D. Tran. Detect-and-track: Efficient pose estimation in videos. In CVPR, pages 350–359, 2018. 7, 9
[20] G. Gkioxari, A. Toshev, and N. Jaitly. Chained predictions using convolutional neural networks. In ECCV, pages 728– 743, 2016. 2
[21] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick. Mask R-CNN. In ICCV, pages 2980–2988, 2017. 5
[22] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016. 9
[23] P. Hu and D. Ramanan. Bottom-up and top-down reasoning with hierarchical rectified gaussians. In CVPR, pages 5600– 5609, 2016. 2
[24] G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Q. Weinberger. Multi-scale dense convolutional networks for efficient prediction. CoRR, abs/1703.09844, 2017. 3
[25] S. Huang, M. Gong, and D. Tao. A coarse-fine network for keypoint localization. In ICCV, pages 3047–3056. IEEE Computer Society, 2017. 5
[26] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR, pages 1647–1655, 2017. 7
[27] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. Deepercut: A deeper, stronger, and faster multi- person pose estimation model. In ECCV, pages 34–50, 2016. 1, 2, 3, 6, 7
[28] U. Iqbal, A. Milan, and J. Gall. Posetrack: Joint multi-person pose estimation and tracking. In CVPR, pages 4654–4663, 2017. 6, 7
[29] S. Jin, X. Ma, Z. Han, Y. Wu, W. Yang, W. Liu, C. Qian, and W. Ouyang. Towards multi-person pose tracking: Bottom-up and top-down methods. In ICCV PoseTrack Workshop, 2017. 7
[30] A. Kanazawa, A. Sharma, and D. W. Jacobs. Lo- cally scale-invariant convolutional neural networks. CoRR, abs/1412.5104, 2014. 3
[31] L. Ke, M. Chang, H. Qi, and S. Lyu. Multi-scale structure-aware network for human pose estimation. CoRR, abs/1803.09894, 2018. 2, 3, 6
[32] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. 5
[33] M. Kocabas, S. Karagoz, and E. Akbas. Multiposenet: Fast multi-person pose estimation using pose residual network. In ECCV, volume 11215 of Lecture Notes in Computer Science, pages 437–453. Springer, 2018. 1, 5
[34] C. Lee, S. Xie, P. W. Gallagher, Z. Zhang, and Z. Tu. Deeply- supervised nets. In AISTATS, 2015. 3
[35] I. Lifshitz, E. Fetaya, and S. Ullman. Human pose estima- tion using deep consensus voting. In ECCV, pages 246–260, 2016. 2, 3
[36] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ra- manan, P. Dollár, and C. L. Zitnick. Microsoft COCO: com- mon objects in context. In ECCV, pages 740–755, 2014. 2, 4
[37] D. C. Luvizon, H. Tabia, and D. Picard. Human pose re- gression by combining indirect part detection and contextual information. CoRR, abs/1710.02322, 2017. 6
[38] A. Milan, L. Leal-Taixé, I. D. Reid, S. Roth, and K. Schindler. MOT16: A benchmark for multi-object track- ing. CoRR, abs/1603.00831, 2016. 7
[39] A. Newell, Z. Huang, and J. Deng. Associative embedding: End-to-end learning for joint detection and grouping. In NIPS, pages 2274–2284, 2017. 1, 5
[40] A. Newell, K. Yang, and J. Deng. Stacked hourglass net- works for human pose estimation. In ECCV, pages 483–499, 2016. 1, 2, 3, 5, 6, 7, 9
[41] X. Nie, J. Feng, J. Xing, and S. Yan. Pose partition networks for multi-person pose estimation. In ECCV, September 2018. 1
[42] X. Nie, J. Feng, and S. Yan. Mutual learning to adapt for joint human parsing and pose estimation. In ECCV, September. 2
[43] X. Nie, J. Feng, Y. Zuo, and S. Yan. Human pose estimation with parsing induced learner. In CVPR, June 2018. 2
[44] G. Ning, Z. Zhang, and Z. He. Knowledge-guided deep frac- tal neural networks for human pose estimation. IEEE Trans. Multimedia, 20(5):1246–1259, 2018. 6
[45] W. Ouyang, X. Chu, and X. Wang. Multi-source deep learn- ing for human pose estimation. In CVPR, pages 2337–2344, 2014. 2
[46] G. Papandreou, T. Zhu, L.-C. Chen, S. Gidaris, J. Tompson, and K. Murphy. Personlab: Person pose estimation and in- stance segmentation with a bottom-up, part-based, geometric embedding model. In ECCV, September 2018. 1, 5
[47] G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tomp- son, C. Bregler, and K. Murphy. Towards accurate multi- person pose estimation in the wild. In CVPR, pages 3711– 3719, 2017. 1, 5
[48] X. Peng, Z. Tang, F. Yang, R. S. Feris, and D. Metaxas. Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. In CVPR, June 2018. 2
[49] T. Pfister, J. Charles, and A. Zisserman. Flowing convnets for human pose estimation in videos. In ICCV, pages 1913– 1921, 2015. 1
[50] L. Pishchulin, M. Andriluka, P. V. Gehler, and B. Schiele. Poselet conditioned pictorial structures. In CVPR, pages 588–595, 2013. 2
[51] L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. An- driluka, P. V. Gehler, and B. Schiele. Deepcut: Joint subset partition and labeling for multi person pose estimation. In CVPR, pages 4929–4937, 2016. 3, 7
[52] T. Pohlen, A. Hermans, M. Mathias, and B. Leibe. Full- resolution residual networks for semantic segmentation in street scenes. In CVPR, 2017. 3
[53] PoseTrack. PoseTrack Leader Board. https:// posetrack.net/leaderboard.php. 7
[54] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li. Imagenet large scale visual recog- nition challenge. International Journal of Computer Vision, 115(3):211–252, 2015. 9
[55] M. Samy, K. Amer, K. Eissa, M. Shaker, and M. ElHelw. Nu- net: Deep residual wide field of view convolutional neural network for semantic segmentation. In CVPRW, June 2018. 3
[56] S. Saxena and J. Verbeek. Convolutional neural fabrics. In NIPS, pages 4053–4061, 2016. 3
[57] T. Sekii. Pose proposal networks. In ECCV, September 2018. 1
[58] K. Sun, C. Lan, J. Xing, W. Zeng, D. Liu, and J. Wang. Hu- man pose estimation using global and local normalization. In ICCV, pages 5600–5608, 2017. 2, 6
[59] K. Sun, M. Li, D. Liu, and J. Wang. IGCV3: interleaved low- rank group convolutions for efficient deep neural networks. In BMVC, page 101. BMVA Press, 2018. 3
[60] X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei. Integral hu- man pose regression. In ECCV, pages 536–553, 2018. 5
[61] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, pages 1–9, 2015. 3
[62] W. Tang, P. Yu, and Y. Wu. Deeply learned compositional models for human pose estimation. In ECCV, September 2018. 2, 6, 7, 9
[63] Z. Tang, X. Peng, S. Geng, L. Wu, S. Zhang, and D. N. Metaxas. Quantized densely connected u-nets for efficient landmark localization. In ECCV, pages 348–364, 2018. 6
[64] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. Efficient object localization using convolutional networks. In CVPR, pages 648–656, 2015. 3
[65] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler. Joint train- ing of a convolutional network and a graphical model for human pose estimation. In NIPS, pages 1799–1807, 2014. 2
[66] A. Toshev and C. Szegedy. Deeppose: Human pose estima- tion via deep neural networks. In CVPR, pages 1653–1660, 2014. 2
[67] J. Wang, Z. Wei, T. Zhang, and W. Zeng. Deeply-fused nets. CoRR, abs/1605.07716, 2016. 3
[68] Z. Wang, W. Li, B. Yin, Q. Peng, T. Xiao, Y. Du, Z. Li, X. Zhang, G. Yu, and J. Sun. Mscoco keypoints challenge 2018. In Joint Recognition Challenge Workshop at ECCV 2018, 2018. 4
[69] S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. Convo- lutional pose machines. In CVPR, pages 4724–4732, 2016. 3, 6
[70] J. Wu, H. Zheng, B. Zhao, Y. Li, B. Yan, R. Liang, W. Wang, S. Zhou, G. Lin, Y. Fu, et al. Ai challenger: A large- scale dataset for going deeper in image understanding. arXiv preprint arXiv:1711.06475, 2017. 6
[71] F. Xia, P. Wang, X. Chen, and A. L. Yuille. Joint multi- person pose estimation and semantic part segmentation. In CVPR, pages 6080–6089, 2017. 1
[72] B. Xiao, H. Wu, and Y. Wei. Simple baselines for human pose estimation and tracking. In ECCV, pages 472–487, 2018. 1, 2, 3, 5, 6, 7, 8, 9
[73] G. Xie, J. Wang, T. Zhang, J. Lai, R. Hong, and G. Qi. Inter- leaved structured sparse convolutional neural networks. In CVPR, pages 8847–8856. IEEE Computer Society, 2018. 3
[74] S. Xie and Z. Tu. Holistically-nested edge detection. In ICCV, pages 1395–1403, 2015. 3
[75] Y. Xiu, J. Li, H. Wang, Y. Fang, and C. Lu. Pose flow: Effi- cient online pose tracking. In BMVC, page 53, 2018. 9
[76] Y. Xu, T. Xiao, J. Zhang, K. Yang, and Z. Zhang. Scale-invariant convolutional neural networks. CoRR, abs/1411.6369, 2014. 3
[77] W. Yang, S. Li, W. Ouyang, H. Li, and X. Wang. Learning feature pyramids for human pose estimation. In ICCV, pages 1290–1299, 2017. 1, 2, 3, 5, 6, 7, 9
[78] W. Yang, W. Ouyang, H. Li, and X. Wang. End-to-end learn- ing of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In CVPR, pages 3073–3082, 2016. 2
[79] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, pages 1385–1392, 2011. 2
[80] T. Zhang, G. Qi, B. Xiao, and J. Wang. Interleaved group convolutions. In ICCV, pages 4383–4392, 2017. 3
[81] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In CVPR, pages 6230–6239, 2017. 3
[82] L. Zhao, M. Li, D. Meng, X. Li, Z. Zhang, Y. Zhuang, Z. Tu, and J. Wang. Deep convolutional neural networks with merge-and-run mappings. In IJCAI, pages 3170–3176, 2018. 3
[83] Y. Zhou, X. Hu, and B. Zhang. Interlinked convolutional neural networks for face parsing. In ISNN, pages 222–231, 2015. 3
[84] X. Zhu, Y. Jiang, and Z. Luo. Multi-person pose estimation for posetrack with enhanced part affinity fields. In ICCV PoseTrack Workshop, 2017. 7