该文章出自2017年的CVPR,Realtime Multi-Person 2D Pose Estimation using Part Affinity Field,是CMU的工作,效果真的amazing。
也许这篇文章的亮点在于,融合了PCM和PAF的级联cascade形网络结构,网络设计思想和RefineNet的网络设计思想很像,以及相应条件约束的偶匹配(bipartite matchings)算法。
整个检测过程如上图所示,输入一幅图像,然后经过7个stage,得到PCM和PAF。然后根据PAF生成一系列的偶匹配,由于PAF自身的矢量性,使得生成的偶匹配很正确,最终合并为一个人的整体骨架。
在models/pose/mpi/下面有2个模型,pose_deploy_linevec.prototxt模型相对更准确一些,pose_deploy_linevec_faster_4_stages.prototxt模型精度稍微有所下降,大概2个点,但是速度提升有30%。
这2个模型的区别就是,后者去掉了stage5,stage6这2个卷积模块(如下图所示,每个stage由一系列的卷积层组成,其中Branch1由1个列组成,Branch2由2个列组成,在进入下一个stage前,上一个stage的3个列进行融合)。
models/pose/coco/pose_deploy_linevec.prototxt和models/pose/mpi/pose_deploy_linevec.prototxt网络结构一样,区别就是coco的卷基层滤波器数目更多点。这个模型是3个模型中最精确的一个。
模型的设置可以在,examples/openpose/openpose.cpp中设置,默认调用COCO的model
DEFINE_string(model_pose,"COCO","Modelto be used (e.g. COCO, MPI, MPI_4_layers).");
安装步骤(环境centos):
git clone https://github.com/CMU-Perceptual-Computing-Lab/openpose.git
cd 3rdparty/caffe
cp Makefile.config.Ubuntu14.example Makefile.config #修改其中的路径为自己计算机路径
make all -j8
cd ../../models/
./getModels.sh
cd ..
cp Makefile.config.Ubuntu14.example Makefile.config #修改其中的路径为自己计算机路径
make -j8
可能错误:
/tmp/cciLUahT.s:1660: Error: no suchinstruction: `vextracti128 $0x1,%ymm0,%xmm0'
make: ***[.build_release/src/openpose/gui/guiInfoAdder.o] 错误 1
make: *** 正在等待未完成的任务....
/tmp/ccpTDNgT.s:3892: Error: no suchinstruction: `vextracti128 $0x1,%ymm0,%xmm0'
make: ***[.build_release/src/openpose/gui/gui.o] 错误 1
解决方法:注释掉Makefile中的汇编优化,204行,CXXFLAGS += -march=native
可能错误:
ERROR: something wrong with flag'tab_completion_word' in file
解决方法:
去掉Makefile中144行,
LIBRARIES += glog gflags boost_systemboost_filesystem m hdf5_hl hdf5 caffe
中的gflags
测试:
跑视频:
./build/examples/openpose/openpose.bin--video examples/media/video.avi
跑摄像头:
./build/examples/openpose/openpose.bin
跑图片:
./build/examples/openpose/openpose.bin--image_dir examples/media/
pets数据集上的测试效果:
openpose训练
训练代码:
https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation
训练需要使用的caffe
https://github.com/CMU-Perceptual-Computing-Lab/caffe_train
需要做如下修改
(1)delete opencv_contrib in Makefile
LIBRARIES += opencv_core opencv_highgui opencv_imgproc (delete opencv_contrib)
(2)remove the inlucde header
./src/caffe/cpm_data_transformer.cpp
#include
matlab的coco工具箱
注意需要使用linux环境,windows环境没有对应mex不支持
https://github.com/cocodataset/cocoapi
训练:
cd Realtime_Multi-Person_Pose_Estimation-master/training
bash getData.sh
/usr/local/MATLAB/R2017a/bin/matlab -nojvm -nodesktop -nodisplay -r getANNO
/usr/local/MATLAB/R2017a/bin/matlab -nojvm -nodesktop -nodisplay -r genCOCOMask
/usr/local/MATLAB/R2017a/bin/matlab -nojvm -nodesktop -nodisplay -r genJSON
python genLMDB.py
python setLayers.py --exp 1
cd dataset/COCO/COCO_kpt/pose56/exp22
bash train_pose.sh 0,1
网络训练结构图:
genJSON函数生成的json格式:
{
"root": [{
"dataset": "COCO",#数据集,string
"isValidation": 0.000,#是否可见,0可见,1不可见,float
"img_paths": "train2014/COCO_train2014_000000000308.jpg",#图像路径,string
"img_width": 640.000,#图像宽度,float
"img_height": 426.000,#图像高度,float
"objpos": [201.540, 226.370],#物体中心坐标,float
"image_id": 308.000,#图片ID,float
"bbox": [134.680, 28.650, 133.720, 395.440],#物体边框,float
"segment_area": 23904.367,#物体区域面积,float
"num_keypoints": 15.000,#可见的关键点个数,float
"joint_self": [#中心物体的关键点坐标,x,y,label,float
#label=0:可见,正常,不需裁剪
#label=1:存在不可见部分,有遮挡occluded
#label=2:身体部分被裁剪掉,cropped
[209.000, 82.000, 1.000],
[217.000, 74.000, 1.000],
[198.000, 73.000, 1.000],
[220.000, 64.000, 1.000],
[180.000, 68.000, 1.000],
[227.000, 118.000, 1.000],
[152.000, 123.000, 1.000],
[0.000, 0.000, 2.000],
[159.000, 195.000, 0.000],
[0.000, 0.000, 2.000],
[208.000, 196.000, 1.000],
[235.000, 245.000, 1.000],
[190.000, 254.000, 1.000],
[252.000, 320.000, 0.000],
[195.000, 346.000, 1.000],
[250.000, 381.000, 1.000],
[200.000, 422.000, 1.000]
],
"scale_provided": 1.075,#物体边框的高度/368,float
"joint_others": [#其他物体,不在图片中心的物体的关键点坐标
[
[52.000, 135.000, 1.000],
[63.000, 123.000, 1.000],
[34.000, 119.000, 1.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[11.000, 209.000, 0.000],
[0.000, 0.000, 2.000],
[43.000, 332.000, 1.000],
[0.000, 0.000, 2.000],
[129.000, 251.000, 1.000],
[0.000, 0.000, 2.000],
[59.000, 404.000, 1.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000]
],
[
[115.000, 75.000, 1.000],
[117.000, 64.000, 1.000],
[106.000, 63.000, 1.000],
[0.000, 0.000, 2.000],
[71.000, 58.000, 1.000],
[0.000, 0.000, 2.000],
[52.000, 141.000, 0.000],
[0.000, 0.000, 2.000],
[75.000, 226.000, 0.000],
[0.000, 0.000, 2.000],
[135.000, 265.000, 0.000],
[0.000, 0.000, 2.000],
[90.000, 274.000, 0.000],
[0.000, 0.000, 2.000],
[88.000, 390.000, 0.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000]
],
[
[142.000, 98.000, 1.000],
[142.000, 96.000, 1.000],
[138.000, 96.000, 1.000],
[0.000, 0.000, 2.000],
[122.000, 99.000, 1.000],
[141.000, 122.000, 0.000],
[116.000, 127.000, 0.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000]
],
[
[448.000, 98.000, 1.000],
[453.000, 84.000, 1.000],
[0.000, 0.000, 2.000],
[489.000, 74.000, 1.000],
[0.000, 0.000, 2.000],
[523.000, 143.000, 1.000],
[480.000, 126.000, 1.000],
[517.000, 231.000, 1.000],
[456.000, 178.000, 1.000],
[442.000, 248.000, 1.000],
[409.000, 191.000, 1.000],
[503.000, 295.000, 1.000],
[472.000, 278.000, 1.000],
[481.000, 392.000, 1.000],
[461.000, 375.000, 1.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000]
],
[
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[61.000, 360.000, 1.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000],
[0.000, 0.000, 2.000]
],
[
[361.000, 92.000, 1.000],
[365.000, 86.000, 1.000],
[0.000, 0.000, 2.000],
[378.000, 86.000, 0.000],
[0.000, 0.000, 2.000],
[383.000, 107.000, 0.000],
[360.000, 105.000, 1.000],
[0.000, 0.000, 2.000],
[351.000, 130.000, 1.000],
[0.000, 0.000, 2.000],
[349.000, 147.000, 1.000],
[380.000, 157.000, 1.000],
[361.000, 158.000, 1.000],
[381.000, 204.000, 0.000],
[363.000, 206.000, 1.000],
[382.000, 241.000, 0.000],
[366.000, 240.000, 0.000]
]
],
"annolist_index": 15.000,#json的索引,float
"people_index": 1.000,#人物的索引,float
"numOtherPeople": 6.000,#除去中心的人,其他人的个数,float
"scale_provided_other": [1.029, 0.723, 0.239, 1.076, 0.502, 0.479],#其他人的高度/368,float
"objpos_other": [#其他人的中心坐标,float
[92.540, 236.655],
[127.225, 148.285],
[128.435, 122.660],
[468.990, 221.455],
[59.830, 328.830],
[370.745, 161.240]
],
"bbox_other": [#其他人的边框坐标,x,y,width,height,float
[0.000, 47.310, 185.080, 378.690],
[45.740, 15.340, 162.970, 265.890],
[111.180, 78.760, 34.510, 87.800],
[381.090, 23.470, 175.800, 395.970],
[0.000, 236.450, 119.660, 184.760],
[345.800, 73.100, 49.890, 176.280]
],
"segment_area_other": [28055.014, 15191.540, 1602.758, 36639.449, 8737.961, 3366.046],#其他人的分割的区域面积,float
"num_keypoints_other": [7.000, 9.000, 6.000, 13.000, 1.000, 13.000]#其他人的可见关键点个数,float
}]
}
可视化程序:
import json
import cv2
with open('coco1.json', 'r') as f:
coco_json = json.load(f)
image =cv2.imread("COCO_train2014_000000000308.jpg",1)
for ann in coco_json["root"]:
box = ann["bbox"]#x,y,width,height
cv2.rectangle(image,(int(box[0]),int(box[1])),(int(box[0]+box[2]),int(box[1]+box[3])),(0,255,255),1)
box_other =ann["bbox_other"]
for b_o in box_other:
cv2.rectangle(image,(int(b_o[0]),int(b_o[1])),(int(b_o[0]+b_o[2]),int(b_o[1]+b_o[3])),(0,0,255),1)
joint_self=ann["joint_self"]
for joint in joint_self:
if (joint[2]==0):
cv2.circle(image,(int(joint[0]),int(joint[1])),1,(255,0,0),1)
if (joint[2]==1):
cv2.circle(image,(int(joint[0]),int(joint[1])),1,(0,255,0),2)
others=ann["joint_others"]
for other in others:
for oth in other:
if (oth[2]==0):
cv2.circle(image,(int(oth[0]),int(oth[1])),1,(255,0,0),1)
if (oth[2]==1):
cv2.circle(image,(int(oth[0]),int(oth[1])),1,(0,255,0),2)
cv2.imshow("image",image)
cv2.waitKey()
结果:
预训练模型链接:
https://download.csdn.net/download/qq_14845119/11616965
reference:
https://github.com/CMU-Perceptual-Computing-Lab/openpose