https://github.com/wufan-tb/yolo_slowfast
这个只是应用,没有训练、评估,所以简单一些
Namespace(input='D:/wyh/Pytorch-Action-Recognition-master/k400/test/rYi2_JhxM4E_000011_000021.mp4',
output='output.mp4',
imsize=640,
conf=0.4, #对象置信度阈值
iou=0.4, #NMS(非极大值抑制)的IOU(交并比)阈值
device='cuda',
classes=None, #指定要识别的类别编号
show=False)
_aggregate_bboxes_labels: 这个方法用于将标签和边界框聚合起来,以便处理多个相同边界框的情况。 from_csv: 这个方法从csv文件中加载并解析标签和帧路径。它接受多个参数,包括帧路径文件、帧标签文件、视频路径前缀和标签映射文件。它返回一个包含视频帧目录和标签字典的元组列表。
load_and_parse_labels_csv:
这个方法用于解析AVA数据集的帧标签csv文件。它接受帧标签文件路径、视频名称到索引的映射字典和允许的类别ID集合作为参数。它返回一个字典,其中包含每个视频中每个关键帧的标签信息。
load_image_lists:
这个方法用于从文件中加载图像路径列表。它接受帧路径文件和视频路径前缀作为参数,并返回一个包含绝对帧路径的列表、视频索引到名称的映射字典和视频名称到索引的映射字典。
read_label_map:
这个方法用于读取标签映射文件,将类别ID映射到类别名称。它接受标签映射文件路径作为参数,并返回一个包含映射字典和类别ID集合的元组。
这个类的作用是提供了一些方法来加载和解析AVA数据集中的标签和帧路径信息,以便后续的数据处理和训练任务。通过调用from_csv方法,可以得到一个列表,其中包含每个视频的帧目录和对应的标签字典。
从而返回这80类动作ava_labelnames
1: 'bend/bow (at the waist)',
2: 'crawl',
3: 'crouch/kneel',
4: 'dance',
5: 'fall down',
6: 'get up',
7: 'jump/leap',
8: 'lie/sleep',
9: 'martial art',
10: 'run',
11: 'sit',
12: 'stand',
13: 'swim',
14: 'walk',
15: 'answer phone',
16: 'brush teeth',
17: 'carry/hold (an object)',
18: 'catch (an object)',
19: 'chop',
20: 'climb (e.g., a mountain)',
21: 'clink glass',
22: 'close (e.g., a door, a box)',
23: 'cook',
24: 'cut',
25: 'dig',
26: 'dress/put on clothing',
27: 'drink',
28: 'drive (e.g., a car, a truck)',
29: 'eat',
30: 'enter',
31: 'exit',
32: 'extract',
33: 'fishing',
34: 'hit (an object)',
35: 'kick (an object)',
36: 'lift/pick up',
37: 'listen (e.g., to music)',
38: 'open (e.g., a window, a car door)',
39: 'paint',
40: 'play board game',
41: 'play musical instrument',
42: 'play with pets',
43: 'point to (an object)',
44: 'press',
45: 'pull (an object)',
46: 'push (an object)',
47: 'put down',
48: 'read',
49: 'sit',
50: 'row boat',
51: 'sail boat',
52: 'shoot',
53: 'shovel',
54: 'smoke',
55: 'stir',
56: 'take a photo',
57: 'text on/look at a cellphone',
58: 'throw',
59: 'touch (an object)',
60: 'turn (e.g., a screwdriver)',
61: 'stand',
62: 'work on a computer',
63: 'write',
64: 'fight/hit (a person)',
65: 'give/serve (an object) to (a person)',
66: 'grab (a person)',
67: 'hand clap',
68: 'hand shake',
69: 'hand wave',
70: 'hug (a person)',
71: 'kick (a person)',
72: 'kiss (a person)',
73: 'lift (a person)',
74: 'listen to (a person)',
75: 'play with kids',
76: 'push (another person)',
77: 'sing to (e.g., self, a person, a group)',
78: 'take (an object) from (a person)',
79: 'talk to (e.g., self, a person, a group)',
80: 'stand'
(这里面竟然有3给个stand,佛了)
检测对象也分为80类
{0: 'person',
1: 'bicycle',
2: 'car',
3: 'motorcycle',
4: 'airplane',
5: 'bus',
6: 'train',
7: 'truck',
8: 'boat',
9: 'traffic light',
10: 'fire hydrant',
11: 'stop sign',
12: 'parking meter',
13: 'bench',
14: 'bird',
15: 'cat',
16: 'dog',
17: 'horse',
18: 'sheep',
19: 'cow',
20: 'elephant',
21: 'bear',
22: 'zebra',
23: 'giraffe',
24: 'backpack',
25: 'umbrella',
26: 'handbag',
27: 'tie',
28: 'suitcase',
29: 'frisbee',
30: 'skis',
31: 'snowboard',
32: 'sports ball',
33: 'kite',
34: 'baseball bat',
35: 'baseball glove',
36: 'skateboard',
37: 'surfboard',
38: 'tennis racket',
39: 'bottle',
40: 'wine glass',
41: 'cup',
42: 'fork',
43: 'knife',
44: 'spoon',
45: 'bowl',
46: 'banana',
47: 'apple',
48: 'sandwich',
49: 'orange',
50: 'broccoli',
51: 'carrot',
52: 'hot dog',
53: 'pizza',
54: 'donut',
55: 'cake',
56: 'chair',
57: 'couch',
58: 'potted plant',
59: 'bed',
60: 'dining table',
61: 'toilet',
62: 'tv',
63: 'laptop',
64: 'mouse',
65: 'remote',
66: 'keyboard',
67: 'cell phone',
68: 'microwave',
69: 'oven',
70: 'toaster',
71: 'sink',
72: 'refrigerator',
73: 'book',
74: 'clock',
75: 'vase',
76: 'scissors',
77: 'teddy bear',
78: 'hair drier',
79: 'toothbrush'
}
将 COCO 数据集中的目标类别标签与对应的颜色进行关联,以便在可视化时使用
0: cv2.CAP_PROP_POS_MSEC,当前位置的毫秒数。
1: cv2.CAP_PROP_POS_FRAMES,下一帧的索引。
2: cv2.CAP_PROP_POS_AVI_RATIO,视频文件的相对位置(0代表开始,1代表结束)。
3: cv2.CAP_PROP_FRAME_WIDTH,帧的宽度。
4: cv2.CAP_PROP_FRAME_HEIGHT,帧的高度。
5: cv2.CAP_PROP_FPS,帧率(每秒的帧数)。
6: cv2.CAP_PROP_FOURCC,视频编解码器的四字符代码。
7: cv2.CAP_PROP_FRAME_COUNT,帧数。
这里只要宽高1280,720
逐帧遍历视频
首先通过MyVideoCapture类内cv2.VideoCapture.read()方法读到帧图像(有些类似链表式读取下一帧)
img(720, 1280, 3)(opcv惯例,BGR顺序)
AutoShape(
# 这一块是在user\.cache\torch\hub\ultralytics_yolov5_master\models\common.py里面
# 定义一个[64, 3, 6, 6]的存储下一级网络参数的p
# 经过一些预处理,将输入图像转为im[1, 3, 384, 640],以满足stride=64
(model): DetectMultiBackend(
(model): DetectionModel(
# 在user\.cache\torch\hub\ultralytics_yolov5_master\models\yolo.py里
# 其中在BaseModel._forward_once里面有个m.f=-1,结合parse_model理解
# i是网络模型中每个模块的索引,表示模块在模型中的位置,通过 enumerate(d['backbone'] + d['head'])枚举得到
# f是模块的来源索引,表示当前模块接收输入数据的来源索引,是在 enumerate(d['backbone'] + d['head']) 中的第一个元素
# type是模块的类型,表示模块的名称或类别,通过 str(m)[8:-2].replace('__main__.', '') 得到
# np是模块的参数数量,表示模块中的可训练参数的数量,通过计算 m_.parameters() 中各个参数的数量求和得到的
(model): Sequential(
(0): Conv(
(conv): Conv2d(3, 64, kernel_size=(6, 6), stride=(2, 2), padding=(2, 2))
(act): SiLU(inplace=True)
)# 得到[1, 64, 192, 320]
(1): Conv(
(conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 128, 96, 160])
(2): C3(
(cv1): Conv(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 128, 96, 160])
(3): Conv(
(conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 256, 48, 80])
(4): C3(
(cv1): Conv(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(3): Bottleneck(
(cv1): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(4): Bottleneck(
(cv1): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(5): Bottleneck(
(cv1): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 256, 48, 80])
(5): Conv(
(conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 512, 24, 40])
(6): C3(
(cv1): Conv(
(conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(3): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(4): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(5): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(6): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(7): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(8): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 512, 24, 40])
(7): Conv(
(conv): Conv2d(512, 768, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 768, 12, 20])
(8): C3(
(cv1): Conv(
(conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 768, 12, 20])
(9): Conv(
(conv): Conv2d(768, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 1024, 6, 10])
(10): C3(
(cv1): Conv(
(conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(1024, 1024, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 1024, 6, 10])
(11): SPPF(
(cv1): Conv(
(conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(2048, 1024, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False)
)# 得到[1, 1024, 6, 10])
(12): Conv(
(conv): Conv2d(1024, 768, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 768, 6, 10])
(13): Upsample(scale_factor=2.0, mode=nearest)
# 得到[1, 768, 12, 20])
# 与此同时根据self.save=[4, 6, 8, 12, 16, 20, 23, 26, 29, 32]保存对应索引的层输出
(14): Concat()
#这里m.f变成[-1, 8],需要把对应索引的y和原来的x并在一个list,实现14层的拼接
# 得到[1, 1536, 12, 20])
(15): C3(
(cv1): Conv(
(conv): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 768, 12, 20])
(16): Conv(
(conv): Conv2d(768, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 512, 12, 20])
(17): Upsample(scale_factor=2.0, mode=nearest)
# 得到[1, 512, 24, 40])
(18): Concat()
# 此时m.f=[-1,6]
# 拼接得到[1, 1024, 24, 40])
(19): C3(
(cv1): Conv(
(conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 512, 24, 40])
(20): Conv(
(conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 256, 24, 40])
(21): Upsample(scale_factor=2.0, mode=nearest)
# 得到[1, 256, 48, 80])
(22): Concat()
# m.f=[-1,4]
# 得到[1, 512, 48, 80])
(23): C3(
(cv1): Conv(
(conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 256, 48, 80])
(24): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 256, 24, 40])
(25): Concat()
# m.f=[-1,20]
# 得到[1, 512, 24, 40])
(26): C3(
(cv1): Conv(
(conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 512, 24, 40])
(27): Conv(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 512, 12, 20])
(28): Concat()
# m.f=[-1,16]
# 得到[1, 1024, 12, 20])
(29): C3(
(cv1): Conv(
(conv): Conv2d(1024, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(1024, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 768, 12, 20])
(30): Conv(
(conv): Conv2d(768, 768, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)# 得到[1, 768, 6, 10])
(31): Concat()
# m.f=[-1,12]
# 得到[1, 1536, 6, 10])
(32): C3(
(cv1): Conv(
(conv): Conv2d(1536, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(1536, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(1024, 1024, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)# 得到[1, 1024, 6, 10])
(33): Detect(
# m.f=[23, 26, 29, 32]
(m): ModuleList(
(0): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(768, 255, kernel_size=(1, 1), stride=(1, 1))
(3): Conv2d(1024, 255, kernel_size=(1, 1), stride=(1, 1))
)# 分别对应计算后再拼接得到[1, 15300, 85])
# 其实就是[batch_size, num_anchors, num_classes + 5]
# 最内层向量,前五位分别表示:bbox中心坐标,bbox宽高,是否框定对象的置信度
)
)
)
)
)
接着把预测结果送入non_max_suppression函数进行非最大抑制NMS(在general.py里)
排除重叠的检测框
先根据框定置信度筛选一次
将框定对象置信度乘以类别置信度,修正类别置信度
box = xywh2xyxy(x[:, :4])将前四列转变成左上角右下角坐标的形式
if每个bbox多标签
根据类别置信度再筛选一次
else
确定类别
得到检测结果矩阵
因为agnostic=false,这里类别无关,类别偏移7680
最终输出(5,6)的张量,表示五个bbox的俩坐标以及置信度和类别
再通过scale_boxes函数把bbox按照比例尺放到原图尺寸上~~(但是没有把返回值赋给任何变量,那这句话不是白搭的?)~~ (啊不对,好像可以在函数内直接对地址修改,那好像确实不需要返回值)
随后把结果存进Detections类作为AutoShape的输出,其中要素有
- ims 是一个包含图像的列表,每个图像以 NumPy 数组的形式表示。(720, 1280, 3)
- pred 是一个包含预测结果的列表,其中 pred[0] 是一个张量,表示边界框的坐标 (xyxy),置信度 conf 和类别 cls。torch.Size([5, 6])
- names 是一个类别名称的列表。
- files 是一个包含图像文件名的列表。
- times 是一个元组,包含推理过程中的时间信息。
- xyxy 是 pred 的副本,表示边界框的坐标 (xyxy)(以像素为单位)。
- xywh 是一个列表,包含将边界框的坐标 (xyxy) 转换为 (xywh) 格式后的结果(以像素为单位)。
- xyxyn 是一个列表,包含将边界框的坐标 (xyxy) 进行归一化后的结果(以相对图像尺寸的形式表示)。
- xywhn 是一个列表,包含将边界框的坐标 (xywh) 进行归一化后的结果(以相对图像尺寸的形式表示)。
- n 表示图像的数量(批量大小)。
- t 是一个元组,包含各个时间戳的信息(以毫秒为单位)。
- s 是一个元组,表示推理时输入张量的形状 (BCHW)。
接下来把xywh只取前四位(坐标)和pred中的置信度和类别,以及原始图像送入tracker.update(从这里开始把原始图像从BGR转为RGB导致整体泛着蓝色)
首先因为要外貌特征所以先将5个裁剪出来,并且归一化尺寸,拼接得到[5, 3, 128, 64],送入extractor.net提取
Net(
(conv): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
)#得到[5, 64, 64, 32])
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)#得到[5, 64, 64, 32])
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)#得到[5, 128, 32, 16])
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)#得到[5, 256, 16, 8])
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)#得到[5, 512, 8, 4])
(avgpool): AvgPool2d(kernel_size=(8, 4), stride=1, padding=0)
#得到[5, 512, 1, 1])view成[5, 512])
if not self.reid:
(classifier): Sequential(
(0): Linear(in_features=512, out_features=256, bias=True)
(1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=256, out_features=751, bias=True)
)
)
否则下面公式归一化一下就输出
x = x ∥ x ∥ 2 x = \frac{x}{\|x\|_2} x=∥x∥2x
接着将bbox从(x, y, w, h)格式的边界框转换为(x_min, y_min, w, h)格式tlwh
随后将逐个bbox的tlwh,置信度,类别标签,特征送入Detection.update
卡尔曼滤波器
预测运动:
使用状态转移矩阵_motion_mat对上一个时间步的均值向量进行运动预测,得到下一个时间步的预测状态均值。
预测协方差:
使用状态转移矩阵_motion_mat、上一个时间步的协方差矩阵covariance和运动噪声协方差矩阵motion_cov,
计算下一个时间步的预测状态的协方差矩阵。
返回预测的目标状态的均值向量和协方差矩阵
运行匹配级联:调用_match方法,根据检测结果和当前跟踪状态进行匹配,得到匹配的结果、未匹配的跟踪目标和未匹配的检测结果。
matching_cascade
循环执行级别从0到cascade_depth-1
如果没有剩余的未匹配检测结果,跳出循环。
获取当前级别的跟踪目标索引列表,tracks[k].time_since_update == 1 + level
如果当前级别没有要匹配的跟踪目标,即len(track_indices_l) == 0,继续下一个级别。
否则调用min_cost_matching方法,传入距离度量函数、门限阈值、跟踪目标、检测结果、跟踪目标索引列表和未匹配的检测结果列表,执行最小成本匹配操作。
计算代价矩阵的distance_metric就是gated_metric
(大概三帧终于进入这一步,好像是因为前两帧没有先验知识所以先记录检测结果,并匹配检测,但是第三帧才开始出现不匹配检测,所以开始跟踪)
1-AB^T计算余弦相似度矩阵
将匹配结果添加到总的匹配列表中,更新未匹配的检测结果列表。
根据匹配结果计算未匹配的跟踪目标列表,该列表为跟踪目标索引列表减去匹配的跟踪目标索引。
返回匹配的结果、未匹配的跟踪目标列表和未匹配的检测结果列表。
整合前后两帧情况送入min_cost_matching(不同于matching_cascade里是判断这一帧内跟踪和检测是否匹配,这里是判断两帧间是否匹配)
具体是根据iou_cost函数,匹配前后两帧检测结果的匹配程度,
例如第一帧的5个对象和第二帧的3个对象得到一个相似度矩阵表示所谓的成本
然后利用匈牙利算法决定匹配情况
然而说是匈牙利算法,其实是改进的Jonker-Volgenant算法,
通过scipy.optimize._lsap.linear_sum_assignment计算代价矩阵
返回最优分配的行索引和列索引
返回匹配情况和未匹配情况
更新跟踪目标集合:
对于匹配的跟踪目标和检测结果,调用对应跟踪目标的update方法,传入Kalman滤波器和检测结果,更新跟踪目标的状态。
通过将测量信息与预测的状态信息进行比较,利用卡尔曼增益来调整预测的状态分布,从而提高状态估计的准确性。
1.使用预测的均值向量和协方差矩阵,通过状态转移矩阵将状态投影到测量空间上,得到投影后的均值和协方差。
1.根据状态的第四个分量 mean[3],计算用于测量空间的标准差,其中 _std_weight_position 是权重系数。
2.根据标准差构建创新(innovation)协方差矩阵 innovation_cov,它是一个对角矩阵,每个元素是对应标准差的平方。
3.通过乘以投影矩阵 _update_mat,将状态的均值向量投影到测量空间,得到投影后的均值向量 projected_mean。
4.通过乘以投影矩阵和状态的协方差矩阵以及转置矩阵,将状态的协方差矩阵投影到测量空间,得到投影后的协方差矩阵 projected_cov。
5.将创新协方差矩阵 innovation_cov 加到投影后的协方差矩阵 projected_cov 中,以考虑测量误差。
6.最终,函数返回在测量空间中的投影均值向量和协方差矩阵。
2.对投影后的协方差矩阵进行Cholesky分解,得到一个下三角矩阵和一个相关的转置矩阵。
3.使用Cholesky分解的结果和卡尔曼增益公式,计算卡尔曼增益矩阵。
4.计算创新(innovation)向量,表示测量值与投影均值之间的差异。
5.使用卡尔曼增益矩阵对创新向量进行加权,将权重应用于状态均值的更新,得到新的均值。
6.使用卡尔曼增益矩阵对协方差矩阵进行更新,通过减去相关的矩阵乘积,得到新的协方差矩阵。
7.最终,函数返回校正后的均值向量和协方差矩阵,用于更新状态估计。
对于未匹配的跟踪目标,调用对应跟踪目标的mark_missed方法,标记为未匹配。
对于未匹配的检测结果,调用_initiate_track方法,创建新的跟踪目标并初始化。
1.将检测结果转换为格式 (center x, center y, aspect ratio, height)也就是xyah的格式
2.卡尔曼滤波器使用8维状态空间来描述目标的运动和观测。
3.状态空间包括目标的中心位置 (x, y)、宽高比 a、高度 h,以及它们各自的速度
4.返回新跟踪的均值向量和协方差矩阵
最后,移除已标记为删除的跟踪目标。
更新距离度量:
获取所有已确认的跟踪目标的特征向量和目标ID。
相当于给每个对象存储一个特征模板用于后续匹配识别
使用这些特征向量和目标ID更新距离度量。
每个对象有8列以下属性:
x1: bbox 左上角的 x 坐标
y1: bbox 左上角的 y 坐标
x2: bbox 右下角的 x 坐标
y2: bbox 右下角的 y 坐标
label: bbox 的标签或类别
track_id: bbox 所属的轨迹 ID
Vx: bbox 在 x 方向的速度(根据均值计算)
Vy: bbox 在 y 方向的速度(根据均值计算)
int(trackid): 轨迹的 ID,转换为整数类型。
yolo_preds.names[int(cls)]: 边界框的类别名称,通过索引int(cls)从yolo_preds.names列表中获取。
ava_label: AVA 数据集中与轨迹 ID 相关联的标签,根据 id_to_ava_labels 字典获取
25帧一个单元,终于开始识别动作
把这25帧cat成为[3, 25, 720, 1280])
(慢速通道,也就是考察环境信息的是4;快速通道,学习动作信息是)
。(list(torch.Size([3, 8, 640, 1137])(slow),torch.Size([3, 32, 640, 1137]))(fast))
、转换后的检测框(作为张量)(torch.Size([7, 4]))
和原始的检测框信息。总的来说,ava_inference_transform 函数对输入的视频剪辑进行了尺寸调整、均值归一化和通道分割等预处理操作
DetectionBBoxNetwork(
# 输入torch.Size([1, 3, 8, 640, 1137])和torch.Size([1, 3, 32, 640, 1137])给model
(model): Net(
(blocks): ModuleList(
(0): MultiPathWayWithFuse(
(multipathway_blocks): ModuleList(
# x的两位分别送进两个对应不同的ResNetBasicStem
(0): ResNetBasicStem(
(conv): Conv3d(3, 64, kernel_size=(1, 7, 7), stride=(1, 2, 2), padding=(0, 3, 3), bias=False)
(norm): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
(pool): MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=[0, 1, 1], dilation=1, ceil_mode=False)
)# 得到torch.Size([1, 64, 8, 160, 285])
(1): ResNetBasicStem(
(conv): Conv3d(3, 8, kernel_size=(5, 7, 7), stride=(1, 2, 2), padding=(2, 3, 3), bias=False)
(norm): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
(pool): MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=[0, 1, 1], dilation=1, ceil_mode=False)
)# 得到torch.Size([1, 8, 32, 160, 285])
)
(multipathway_fusion): FuseFastToSlow(
(conv_fast_to_slow): Conv3d(8, 16, kernel_size=(7, 1, 1), stride=(4, 1, 1), padding=(3, 0, 0), bias=False)
# 把[1, 8, 32, 160, 285]转成fuse[1, 16, 8, 160, 285])
(norm): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
)
# 将[1, 64, 8, 160, 285]和fuse拼接得到[1, 80, 8, 160, 285]
# 将其作为x[0]与[1, 8, 32, 160, 285]再组成新的list输出
)
(1): MultiPathWayWithFuse(
(multipathway_blocks): ModuleList(
(0): ResStage(
(res_blocks): ModuleList(
(0): ResBlock(
(branch1_conv): Conv3d(80, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
# [1, 80, 8, 160, 285])变成[1, 256, 8, 160, 285])
(branch1_norm): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
'''
(branch2): BottleneckBlock(
(conv_a): Conv3d(80, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
'''
(activation): ReLU()
)
(1): ResBlock(
'''
(branch2): BottleneckBlock(
(conv_a): Conv3d(256, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)'''
(activation): ReLU()
)
(2): ResBlock('''
(branch2): BottleneckBlock(
(conv_a): Conv3d(256, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)'''
(activation): ReLU()
)
)
)
(1): ResStage(
(res_blocks): ModuleList(
(0): ResBlock(
(branch1_conv): Conv3d(8, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(branch1_norm): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)'''
(branch2): BottleneckBlock(
(conv_a): Conv3d(8, 8, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(8, 8, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(8, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)'''
(activation): ReLU()
)# 把[1, 8, 32, 160, 285]变成torch.Size([1, 32, 32, 160, 285])
(1): ResBlock('''
(branch2): BottleneckBlock(
(conv_a): Conv3d(32, 8, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(8, 8, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(8, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)'''
(activation): ReLU()
)
(2): ResBlock('''
(branch2): BottleneckBlock(
(conv_a): Conv3d(32, 8, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(8, 8, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(8, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)'''
(activation): ReLU()
)
)
)
)
(multipathway_fusion): FuseFastToSlow(
(conv_fast_to_slow): Conv3d(32, 64, kernel_size=(7, 1, 1), stride=(4, 1, 1), padding=(3, 0, 0), bias=False)
(norm): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
# 把torch.Size([1, 32, 32, 160, 285])变成torch.Size([1, 64, 8, 160, 285])
# 与torch.Size([1, 256, 8, 160, 285])拼接成torch.Size([1, 320, 8, 160, 285])后再组成list
)
)
(2): MultiPathWayWithFuse(
(multipathway_blocks): ModuleList(
(0): ResStage(
(res_blocks): ModuleList(
(0): ResBlock(
(branch1_conv): Conv3d(320, 512, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
(branch1_norm): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(branch2): BottleneckBlock(
(conv_a): Conv3d(320, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_a): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(128, 128, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(128, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(1): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(512, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_a): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(128, 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(128, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(2): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(512, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_a): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(128, 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(128, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(3): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(512, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_a): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(128, 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(128, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
)
)
(1): ResStage(
(res_blocks): ModuleList(
(0): ResBlock(
(branch1_conv): Conv3d(32, 64, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
(branch1_norm): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(branch2): BottleneckBlock(
(conv_a): Conv3d(32, 16, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(16, 16, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(16, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(1): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(64, 16, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(16, 16, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(16, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(2): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(64, 16, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(16, 16, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(16, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(3): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(64, 16, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(16, 16, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(16, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
)
)
)
(multipathway_fusion): FuseFastToSlow(
(conv_fast_to_slow): Conv3d(64, 128, kernel_size=(7, 1, 1), stride=(4, 1, 1), padding=(3, 0, 0), bias=False)
(norm): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
)
)
(3): MultiPathWayWithFuse(
(multipathway_blocks): ModuleList(
(0): ResStage(
(res_blocks): ModuleList(
(0): ResBlock(
(branch1_conv): Conv3d(640, 1024, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
(branch1_norm): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(branch2): BottleneckBlock(
(conv_a): Conv3d(640, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(1): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(2): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(3): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(4): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(5): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
)
)
(1): ResStage(
(res_blocks): ModuleList(
(0): ResBlock(
(branch1_conv): Conv3d(64, 128, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
(branch1_norm): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(branch2): BottleneckBlock(
(conv_a): Conv3d(64, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(1): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(2): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(3): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(4): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(5): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
(norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
)
)
)
(multipathway_fusion): FuseFastToSlow(
(conv_fast_to_slow): Conv3d(128, 256, kernel_size=(7, 1, 1), stride=(4, 1, 1), padding=(3, 0, 0), bias=False)
(norm): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activation): ReLU()
)
)
(4): MultiPathWayWithFuse(
(multipathway_blocks): ModuleList(
(0): ResStage(
(res_blocks): ModuleList(
(0): ResBlock(
(branch1_conv): Conv3d(1280, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(branch1_norm): BatchNorm3d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(branch2): BottleneckBlock(
(conv_a): Conv3d(1280, 512, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(512, 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
(norm_b): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(512, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(1): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(2048, 512, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(512, 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
(norm_b): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(512, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(2): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(2048, 512, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(512, 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
(norm_b): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(512, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
)
)
(1): ResStage(
(res_blocks): ModuleList(
(0): ResBlock(
(branch1_conv): Conv3d(128, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(branch1_norm): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(branch2): BottleneckBlock(
(conv_a): Conv3d(128, 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
(norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(1): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(256, 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
(norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
(2): ResBlock(
(branch2): BottleneckBlock(
(conv_a): Conv3d(256, 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
(norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_a): ReLU()
(conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
(norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act_b): ReLU()
(conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
(norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(activation): ReLU()
)
)
)
)
(multipathway_fusion): Identity()
)
# 此时x是list(torch.Size([1, 2048, 8, 40, 72]),torch.Size([1, 256, 32, 40, 72]))
(5): PoolConcatPathway(
(pool): ModuleList(
(0): AvgPool3d(kernel_size=(8, 1, 1), stride=(1, 1, 1), padding=(0, 0, 0))# x[0]变成torch.Size([1, 2048, 1, 40, 72])
(1): AvgPool3d(kernel_size=(32, 1, 1), stride=(1, 1, 1), padding=(0, 0, 0))# x[1]变成torch.Size([1, 256, 1, 40, 72])
# 拼接起来得到torch.Size([1, 2304, 1, 40, 72])
)
)
)
)
(detection_head): ResNetRoIHead(
# torch.Size([1, 2304, 40, 72])和bbox[7, 5]输入
(roi_layer): RoIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=False)
# 得到torch.Size([7, 2304, 7, 7])
(pool_spatial): MaxPool2d(kernel_size=(7, 7), stride=1, padding=0, dilation=1, ceil_mode=False)
# 得到[7, 2304, 1, 1]并反压缩为torch.Size([7, 2304, 1, 1, 1])
(dropout): Dropout(p=0.5, inplace=False)
# permute为torch.Size([7, 1, 1, 1, 2304])
(proj): Linear(in_features=2304, out_features=80, bias=True)
# 再permute为torch.Size([7, 80, 1, 1, 1])
(activation): Sigmoid()
)
)# 再整成torch.Size([7, 80])
从而根据argmax判断这7个对象分别是什么动作(根据ava数据集的80类标签)(不过这里面好像没有区别人和其他东西,也就是物品大概也是识别了一下它们的动作,就有点多余了)
——————————————
顺便记录一下bbox的格式
tlwh: (x_top_left, y_top_left, width, height)
x_top_left: 左上角 x 坐标
y_top_left: 左上角 y 坐标
width: 宽度
height: 高度
xywh: (x, y, width, height)
x: 左上角 x 坐标
y: 左上角 y 坐标
width: 宽度
height: 高度
xyxy: (x_min, y_min, x_max, y_max)
x_min: 左上角 x 坐标
y_min: 左上角 y 坐标
x_max: 右下角 x 坐标
y_max: 右下角 y 坐标
cxcywh: (center_x, center_y, width, height)
center_x: 中心点 x 坐标
center_y: 中心点 y 坐标
width: 宽度
height: 高度