读yolo_slowfast代码

https://github.com/wufan-tb/yolo_slowfast
这个只是应用,没有训练、评估,所以简单一些

Namespace(input='D:/wyh/Pytorch-Action-Recognition-master/k400/test/rYi2_JhxM4E_000011_000021.mp4', 
output='output.mp4', 
imsize=640, 
conf=0.4, #对象置信度阈值
iou=0.4, #NMS(非极大值抑制)的IOU(交并比)阈值
device='cuda', 
classes=None, #指定要识别的类别编号
show=False)

通过AvaLabeledVideoFramePaths.read_label_map获取ava数据集的标签信息:

_aggregate_bboxes_labels: 这个方法用于将标签和边界框聚合起来,以便处理多个相同边界框的情况。 from_csv: 这个方法从csv文件中加载并解析标签和帧路径。它接受多个参数,包括帧路径文件、帧标签文件、视频路径前缀和标签映射文件。它返回一个包含视频帧目录和标签字典的元组列表。
load_and_parse_labels_csv:
这个方法用于解析AVA数据集的帧标签csv文件。它接受帧标签文件路径、视频名称到索引的映射字典和允许的类别ID集合作为参数。它返回一个字典,其中包含每个视频中每个关键帧的标签信息。
load_image_lists:
这个方法用于从文件中加载图像路径列表。它接受帧路径文件和视频路径前缀作为参数,并返回一个包含绝对帧路径的列表、视频索引到名称的映射字典和视频名称到索引的映射字典。
read_label_map:
这个方法用于读取标签映射文件,将类别ID映射到类别名称。它接受标签映射文件路径作为参数,并返回一个包含映射字典和类别ID集合的元组。
这个类的作用是提供了一些方法来加载和解析AVA数据集中的标签和帧路径信息,以便后续的数据处理和训练任务。通过调用from_csv方法,可以得到一个列表,其中包含每个视频的帧目录和对应的标签字典。

从而返回这80类动作ava_labelnames

1: 'bend/bow (at the waist)', 
2: 'crawl', 
3: 'crouch/kneel', 
4: 'dance', 
5: 'fall down', 
6: 'get up', 
7: 'jump/leap', 
8: 'lie/sleep', 
9: 'martial art', 
10: 'run', 
11: 'sit', 
12: 'stand', 
13: 'swim', 
14: 'walk', 
15: 'answer phone', 
16: 'brush teeth', 
17: 'carry/hold (an object)', 
18: 'catch (an object)', 
19: 'chop', 
20: 'climb (e.g., a mountain)', 
21: 'clink glass', 
22: 'close (e.g., a door, a box)', 
23: 'cook', 
24: 'cut', 
25: 'dig', 
26: 'dress/put on clothing', 
27: 'drink', 
28: 'drive (e.g., a car, a truck)', 
29: 'eat', 
30: 'enter', 
31: 'exit', 
32: 'extract', 
33: 'fishing', 
34: 'hit (an object)', 
35: 'kick (an object)', 
36: 'lift/pick up', 
37: 'listen (e.g., to music)', 
38: 'open (e.g., a window, a car door)', 
39: 'paint', 
40: 'play board game', 
41: 'play musical instrument', 
42: 'play with pets', 
43: 'point to (an object)', 
44: 'press', 
45: 'pull (an object)', 
46: 'push (an object)', 
47: 'put down', 
48: 'read', 
49: 'sit', 
50: 'row boat', 
51: 'sail boat', 
52: 'shoot', 
53: 'shovel', 
54: 'smoke', 
55: 'stir', 
56: 'take a photo', 
57: 'text on/look at a cellphone', 
58: 'throw', 
59: 'touch (an object)', 
60: 'turn (e.g., a screwdriver)', 
61: 'stand', 
62: 'work on a computer', 
63: 'write', 
64: 'fight/hit (a person)', 
65: 'give/serve (an object) to (a person)', 
66: 'grab (a person)', 
67: 'hand clap', 
68: 'hand shake', 
69: 'hand wave', 
70: 'hug (a person)', 
71: 'kick (a person)', 
72: 'kiss (a person)', 
73: 'lift (a person)', 
74: 'listen to (a person)', 
75: 'play with kids', 
76: 'push (another person)', 
77: 'sing to (e.g., self, a person, a group)', 
78: 'take (an object) from (a person)', 
79: 'talk to (e.g., self, a person, a group)',
80: 'stand'

(这里面竟然有3给个stand,佛了)
检测对象也分为80类

{0: 'person', 
1: 'bicycle', 
2: 'car', 
3: 'motorcycle', 
4: 'airplane', 
5: 'bus', 
6: 'train', 
7: 'truck', 
8: 'boat', 
9: 'traffic light', 
10: 'fire hydrant', 
11: 'stop sign', 
12: 'parking meter', 
13: 'bench', 
14: 'bird', 
15: 'cat', 
16: 'dog', 
17: 'horse', 
18: 'sheep', 
19: 'cow', 
20: 'elephant', 
21: 'bear', 
22: 'zebra', 
23: 'giraffe', 
24: 'backpack', 
25: 'umbrella', 
26: 'handbag', 
27: 'tie', 
28: 'suitcase', 
29: 'frisbee', 
30: 'skis', 
31: 'snowboard', 
32: 'sports ball', 
33: 'kite', 
34: 'baseball bat', 
35: 'baseball glove', 
36: 'skateboard', 
37: 'surfboard', 
38: 'tennis racket', 
39: 'bottle', 
40: 'wine glass', 
41: 'cup', 
42: 'fork', 
43: 'knife', 
44: 'spoon', 
45: 'bowl', 
46: 'banana', 
47: 'apple', 
48: 'sandwich', 
49: 'orange', 
50: 'broccoli', 
51: 'carrot', 
52: 'hot dog', 
53: 'pizza', 
54: 'donut', 
55: 'cake', 
56: 'chair', 
57: 'couch', 
58: 'potted plant', 
59: 'bed', 
60: 'dining table', 
61: 'toilet', 
62: 'tv', 
63: 'laptop', 
64: 'mouse', 
65: 'remote', 
66: 'keyboard', 
67: 'cell phone', 
68: 'microwave', 
69: 'oven', 
70: 'toaster', 
71: 'sink', 
72: 'refrigerator', 
73: 'book', 
74: 'clock', 
75: 'vase', 
76: 'scissors', 
77: 'teddy bear', 
78: 'hair drier', 
79: 'toothbrush'
}

coco_color_map

将 COCO 数据集中的目标类别标签与对应的颜色进行关联,以便在可视化时使用

获取视频属性

0: cv2.CAP_PROP_POS_MSEC,当前位置的毫秒数。
1: cv2.CAP_PROP_POS_FRAMES,下一帧的索引。
2: cv2.CAP_PROP_POS_AVI_RATIO,视频文件的相对位置(0代表开始,1代表结束)。
3: cv2.CAP_PROP_FRAME_WIDTH,帧的宽度。
4: cv2.CAP_PROP_FRAME_HEIGHT,帧的高度。
5: cv2.CAP_PROP_FPS,帧率(每秒的帧数)。
6: cv2.CAP_PROP_FOURCC,视频编解码器的四字符代码。
7: cv2.CAP_PROP_FRAME_COUNT,帧数。
这里只要宽高1280,720

逐帧遍历视频
首先通过MyVideoCapture类内cv2.VideoCapture.read()方法读到帧图像(有些类似链表式读取下一帧)img(720, 1280, 3)(opcv惯例,BGR顺序)

AutoShape(
  # 这一块是在user\.cache\torch\hub\ultralytics_yolov5_master\models\common.py里面
  # 定义一个[64, 3, 6, 6]的存储下一级网络参数的p
  # 经过一些预处理,将输入图像转为im[1, 3, 384, 640],以满足stride=64
  (model): DetectMultiBackend(
    (model): DetectionModel(
    # 在user\.cache\torch\hub\ultralytics_yolov5_master\models\yolo.py里
    # 其中在BaseModel._forward_once里面有个m.f=-1,结合parse_model理解
    # i是网络模型中每个模块的索引,表示模块在模型中的位置,通过 enumerate(d['backbone'] + d['head'])枚举得到
    # f是模块的来源索引,表示当前模块接收输入数据的来源索引,是在 enumerate(d['backbone'] + d['head']) 中的第一个元素
    # type是模块的类型,表示模块的名称或类别,通过 str(m)[8:-2].replace('__main__.', '') 得到
    # np是模块的参数数量,表示模块中的可训练参数的数量,通过计算 m_.parameters() 中各个参数的数量求和得到的
      (model): Sequential(
        (0): Conv(
          (conv): Conv2d(3, 64, kernel_size=(6, 6), stride=(2, 2), padding=(2, 2))
          (act): SiLU(inplace=True)
        )# 得到[1, 64, 192, 320]
        (1): Conv(
          (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 128, 96, 160])
        (2): C3(
          (cv1): Conv(
            (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 128, 96, 160])
        (3): Conv(
          (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 256, 48, 80])
        (4): C3(
          (cv1): Conv(
            (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (3): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (4): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (5): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 256, 48, 80])
        (5): Conv(
          (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 512, 24, 40])
        (6): C3(
          (cv1): Conv(
            (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (3): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (4): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (5): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (6): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (7): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (8): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 512, 24, 40])
        (7): Conv(
          (conv): Conv2d(512, 768, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 768, 12, 20])
        (8): C3(
          (cv1): Conv(
            (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 768, 12, 20])
        (9): Conv(
          (conv): Conv2d(768, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 1024, 6, 10])
        (10): C3(
          (cv1): Conv(
            (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(1024, 1024, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 1024, 6, 10])
        (11): SPPF(
          (cv1): Conv(
            (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(2048, 1024, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False)
        )# 得到[1, 1024, 6, 10])
        (12): Conv(
          (conv): Conv2d(1024, 768, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 768, 6, 10])
        (13): Upsample(scale_factor=2.0, mode=nearest)
        # 得到[1, 768, 12, 20])
        # 与此同时根据self.save=[4, 6, 8, 12, 16, 20, 23, 26, 29, 32]保存对应索引的层输出
        (14): Concat()
        #这里m.f变成[-1, 8],需要把对应索引的y和原来的x并在一个list,实现14层的拼接
        # 得到[1, 1536, 12, 20])
        (15): C3(
          (cv1): Conv(
            (conv): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 768, 12, 20])
        (16): Conv(
          (conv): Conv2d(768, 512, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 512, 12, 20])
        (17): Upsample(scale_factor=2.0, mode=nearest)
        # 得到[1, 512, 24, 40])
        (18): Concat()
        # 此时m.f=[-1,6]
        # 拼接得到[1, 1024, 24, 40])
        (19): C3(
          (cv1): Conv(
            (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 512, 24, 40])
        (20): Conv(
          (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 256, 24, 40])
        (21): Upsample(scale_factor=2.0, mode=nearest)
        # 得到[1, 256, 48, 80])
        (22): Concat()
        # m.f=[-1,4]
        # 得到[1, 512, 48, 80])
        (23): C3(
          (cv1): Conv(
            (conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 256, 48, 80])
        (24): Conv(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 256, 24, 40])
        (25): Concat()
        # m.f=[-1,20]
        # 得到[1, 512, 24, 40])
        (26): C3(
          (cv1): Conv(
            (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 512, 24, 40])
        (27): Conv(
          (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 512, 12, 20])
        (28): Concat()
        # m.f=[-1,16]
        # 得到[1, 1024, 12, 20])
        (29): C3(
          (cv1): Conv(
            (conv): Conv2d(1024, 384, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(1024, 384, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 768, 12, 20])
        (30): Conv(
          (conv): Conv2d(768, 768, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (act): SiLU(inplace=True)
        )# 得到[1, 768, 6, 10])
        (31): Concat()
        # m.f=[-1,12]
        # 得到[1, 1536, 6, 10])
        (32): C3(
          (cv1): Conv(
            (conv): Conv2d(1536, 512, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv2): Conv(
            (conv): Conv2d(1536, 512, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (cv3): Conv(
            (conv): Conv2d(1024, 1024, kernel_size=(1, 1), stride=(1, 1))
            (act): SiLU(inplace=True)
          )
          (m): Sequential(
            (0): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (1): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
            (2): Bottleneck(
              (cv1): Conv(
                (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
                (act): SiLU(inplace=True)
              )
              (cv2): Conv(
                (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
                (act): SiLU(inplace=True)
              )
            )
          )
        )# 得到[1, 1024, 6, 10])
        (33): Detect(
          # m.f=[23, 26, 29, 32]
          (m): ModuleList(
            (0): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
            (1): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
            (2): Conv2d(768, 255, kernel_size=(1, 1), stride=(1, 1))
            (3): Conv2d(1024, 255, kernel_size=(1, 1), stride=(1, 1))
          )# 分别对应计算后再拼接得到[1, 15300, 85])
          # 其实就是[batch_size, num_anchors, num_classes + 5]
          # 最内层向量,前五位分别表示:bbox中心坐标,bbox宽高,是否框定对象的置信度
        )
      )
    )
  )
)

接着把预测结果送入non_max_suppression函数进行非最大抑制NMS(在general.py里)排除重叠的检测框

先根据框定置信度筛选一次
将框定对象置信度乘以类别置信度,修正类别置信度
box = xywh2xyxy(x[:, :4])将前四列转变成左上角右下角坐标的形式
if每个bbox多标签
	根据类别置信度再筛选一次
else
	确定类别
得到检测结果矩阵
因为agnostic=false,这里类别无关,类别偏移7680
最终输出(5,6)的张量,表示五个bbox的俩坐标以及置信度和类别

再通过scale_boxes函数把bbox按照比例尺放到原图尺寸上~~(但是没有把返回值赋给任何变量,那这句话不是白搭的?)~~ (啊不对,好像可以在函数内直接对地址修改,那好像确实不需要返回值)
随后把结果存进Detections类作为AutoShape的输出,其中要素有

  • ims 是一个包含图像的列表,每个图像以 NumPy 数组的形式表示。(720, 1280, 3)
  • pred 是一个包含预测结果的列表,其中 pred[0] 是一个张量,表示边界框的坐标 (xyxy),置信度 conf 和类别 cls。torch.Size([5, 6])
  • names 是一个类别名称的列表。
  • files 是一个包含图像文件名的列表。
  • times 是一个元组,包含推理过程中的时间信息。
  • xyxy 是 pred 的副本,表示边界框的坐标 (xyxy)(以像素为单位)。
  • xywh 是一个列表,包含将边界框的坐标 (xyxy) 转换为 (xywh) 格式后的结果(以像素为单位)。
  • xyxyn 是一个列表,包含将边界框的坐标 (xyxy) 进行归一化后的结果(以相对图像尺寸的形式表示)。
  • xywhn 是一个列表,包含将边界框的坐标 (xywh) 进行归一化后的结果(以相对图像尺寸的形式表示)。
  • n 表示图像的数量(批量大小)。
  • t 是一个元组,包含各个时间戳的信息(以毫秒为单位)。
  • s 是一个元组,表示推理时输入张量的形状 (BCHW)。

接下来把xywh只取前四位(坐标)和pred中的置信度和类别,以及原始图像送入tracker.update(从这里开始把原始图像从BGR转为RGB导致整体泛着蓝色)

deepsort_tracker

首先因为要外貌特征所以先将5个裁剪出来,并且归一化尺寸,拼接得到[5, 3, 128, 64],送入extractor.net提取

Net(
  (conv): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  )#得到[5, 64, 64, 32])
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )#得到[5, 64, 64, 32])
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )#得到[5, 128, 32, 16])
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )#得到[5, 256, 16, 8])
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )#得到[5, 512, 8, 4])
  (avgpool): AvgPool2d(kernel_size=(8, 4), stride=1, padding=0)
  #得到[5, 512, 1, 1])view成[5, 512])
  if not self.reid:
	  (classifier): Sequential(
	    (0): Linear(in_features=512, out_features=256, bias=True)
	    (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
	    (2): ReLU(inplace=True)
	    (3): Dropout(p=0.5, inplace=False)
	    (4): Linear(in_features=256, out_features=751, bias=True)
	  )
)
否则下面公式归一化一下就输出

x = x ∥ x ∥ 2 x = \frac{x}{\|x\|_2} x=x2x

接着将bbox从(x, y, w, h)格式的边界框转换为(x_min, y_min, w, h)格式tlwh
随后将逐个bbox的tlwh,置信度,类别标签,特征送入Detection.update

tracker.predict

卡尔曼滤波器

预测运动:
	使用状态转移矩阵_motion_mat对上一个时间步的均值向量进行运动预测,得到下一个时间步的预测状态均值。
预测协方差:
	使用状态转移矩阵_motion_mat、上一个时间步的协方差矩阵covariance和运动噪声协方差矩阵motion_cov,
	计算下一个时间步的预测状态的协方差矩阵。
返回预测的目标状态的均值向量和协方差矩阵

tracker.update

运行匹配级联:调用_match方法,根据检测结果和当前跟踪状态进行匹配,得到匹配的结果、未匹配的跟踪目标和未匹配的检测结果。
	matching_cascade
		循环执行级别从0到cascade_depth-1
			如果没有剩余的未匹配检测结果,跳出循环。
			获取当前级别的跟踪目标索引列表,tracks[k].time_since_update == 1 + level
			如果当前级别没有要匹配的跟踪目标,即len(track_indices_l) == 0,继续下一个级别。
			否则调用min_cost_matching方法,传入距离度量函数、门限阈值、跟踪目标、检测结果、跟踪目标索引列表和未匹配的检测结果列表,执行最小成本匹配操作。
				计算代价矩阵的distance_metric就是gated_metric
				(大概三帧终于进入这一步,好像是因为前两帧没有先验知识所以先记录检测结果,并匹配检测,但是第三帧才开始出现不匹配检测,所以开始跟踪)
				1-AB^T计算余弦相似度矩阵				
			将匹配结果添加到总的匹配列表中,更新未匹配的检测结果列表。
		根据匹配结果计算未匹配的跟踪目标列表,该列表为跟踪目标索引列表减去匹配的跟踪目标索引。
		返回匹配的结果、未匹配的跟踪目标列表和未匹配的检测结果列表。

整合前后两帧情况送入min_cost_matching(不同于matching_cascade里是判断这一帧内跟踪和检测是否匹配,这里是判断两帧间是否匹配)
	具体是根据iou_cost函数,匹配前后两帧检测结果的匹配程度,
	例如第一帧的5个对象和第二帧的3个对象得到一个相似度矩阵表示所谓的成本
	然后利用匈牙利算法决定匹配情况
	然而说是匈牙利算法,其实是改进的Jonker-Volgenant算法,
	通过scipy.optimize._lsap.linear_sum_assignment计算代价矩阵
		返回最优分配的行索引和列索引
	返回匹配情况和未匹配情况

		
更新跟踪目标集合:
对于匹配的跟踪目标和检测结果,调用对应跟踪目标的update方法,传入Kalman滤波器和检测结果,更新跟踪目标的状态。
	通过将测量信息与预测的状态信息进行比较,利用卡尔曼增益来调整预测的状态分布,从而提高状态估计的准确性。
	1.使用预测的均值向量和协方差矩阵,通过状态转移矩阵将状态投影到测量空间上,得到投影后的均值和协方差。
		1.根据状态的第四个分量 mean[3],计算用于测量空间的标准差,其中 _std_weight_position 是权重系数。
		2.根据标准差构建创新(innovation)协方差矩阵 innovation_cov,它是一个对角矩阵,每个元素是对应标准差的平方。
		3.通过乘以投影矩阵 _update_mat,将状态的均值向量投影到测量空间,得到投影后的均值向量 projected_mean。
		4.通过乘以投影矩阵和状态的协方差矩阵以及转置矩阵,将状态的协方差矩阵投影到测量空间,得到投影后的协方差矩阵 projected_cov。
		5.将创新协方差矩阵 innovation_cov 加到投影后的协方差矩阵 projected_cov 中,以考虑测量误差。
		6.最终,函数返回在测量空间中的投影均值向量和协方差矩阵。
	2.对投影后的协方差矩阵进行Cholesky分解,得到一个下三角矩阵和一个相关的转置矩阵。
	3.使用Cholesky分解的结果和卡尔曼增益公式,计算卡尔曼增益矩阵。
	4.计算创新(innovation)向量,表示测量值与投影均值之间的差异。
	5.使用卡尔曼增益矩阵对创新向量进行加权,将权重应用于状态均值的更新,得到新的均值。
	6.使用卡尔曼增益矩阵对协方差矩阵进行更新,通过减去相关的矩阵乘积,得到新的协方差矩阵。
	7.最终,函数返回校正后的均值向量和协方差矩阵,用于更新状态估计。
	
对于未匹配的跟踪目标,调用对应跟踪目标的mark_missed方法,标记为未匹配。
对于未匹配的检测结果,调用_initiate_track方法,创建新的跟踪目标并初始化。
	1.将检测结果转换为格式 (center x, center y, aspect ratio, height)也就是xyah的格式
	2.卡尔曼滤波器使用8维状态空间来描述目标的运动和观测。
	3.状态空间包括目标的中心位置 (x, y)、宽高比 a、高度 h,以及它们各自的速度
	4.返回新跟踪的均值向量和协方差矩阵
最后,移除已标记为删除的跟踪目标。

更新距离度量:
获取所有已确认的跟踪目标的特征向量和目标ID。
	相当于给每个对象存储一个特征模板用于后续匹配识别
使用这些特征向量和目标ID更新距离度量。

更新bbox属性

每个对象有8列以下属性:

x1: bbox 左上角的 x 坐标
y1: bbox 左上角的 y 坐标
x2: bbox 右下角的 x 坐标
y2: bbox 右下角的 y 坐标
label: bbox 的标签或类别
track_id: bbox 所属的轨迹 ID
Vx: bbox 在 x 方向的速度(根据均值计算)
Vy: bbox 在 y 方向的速度(根据均值计算)

bbox的标注信息:

int(trackid): 轨迹的 ID,转换为整数类型。
yolo_preds.names[int(cls)]: 边界框的类别名称,通过索引int(cls)从yolo_preds.names列表中获取。
ava_label: AVA 数据集中与轨迹 ID 相关联的标签,根据 id_to_ava_labels 字典获取

video_model

25帧一个单元,终于开始识别动作
把这25帧cat成为[3, 25, 720, 1280])

ava_inference_transform

  1. 将输入的检测框坐标转换为数组格式。
  2. 对视频剪辑进行均匀的时间子采样,将帧数调整为指定的 num_frames(慢速通道,也就是考察环境信息的是4;快速通道,学习动作信息是)
  3. 将视频剪辑转换为浮点类型,并将像素值缩放到 [0, 1] 的范围内。
  4. 获取剪辑的高度和宽度,并将检测框的坐标映射到剪辑图像上。
  5. 将剪辑和检测框进行短边缩放,调整它们的尺寸为指定的 crop_size。
  6. 对剪辑进行归一化处理,使用给定的 data_mean 和data_std 进行均值和标准差归一化。
  7. 根据 slow_fast_alpha 的取值,将剪辑分为快速通道和慢速通道。如果slow_fast_alpha 不为 None,则将剪辑拆分成慢速和快速两个通道,否则仅使用慢速通道。
  8. 返回处理后的剪辑(list(torch.Size([3, 8, 640, 1137])(slow),torch.Size([3, 32, 640, 1137]))(fast))、转换后的检测框(作为张量)(torch.Size([7, 4]))和原始的检测框信息。

总的来说,ava_inference_transform 函数对输入的视频剪辑进行了尺寸调整、均值归一化和通道分割等预处理操作

DetectionBBoxNetwork(
  # 输入torch.Size([1, 3, 8, 640, 1137])和torch.Size([1, 3, 32, 640, 1137])给model
  (model): Net(
    (blocks): ModuleList(
      (0): MultiPathWayWithFuse(
        (multipathway_blocks): ModuleList(
          # x的两位分别送进两个对应不同的ResNetBasicStem
          (0): ResNetBasicStem(
            (conv): Conv3d(3, 64, kernel_size=(1, 7, 7), stride=(1, 2, 2), padding=(0, 3, 3), bias=False)
            (norm): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activation): ReLU()
            (pool): MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=[0, 1, 1], dilation=1, ceil_mode=False)
          )# 得到torch.Size([1, 64, 8, 160, 285])
          (1): ResNetBasicStem(
            (conv): Conv3d(3, 8, kernel_size=(5, 7, 7), stride=(1, 2, 2), padding=(2, 3, 3), bias=False)
            (norm): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activation): ReLU()
            (pool): MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=[0, 1, 1], dilation=1, ceil_mode=False)
          )# 得到torch.Size([1, 8, 32, 160, 285])
        )
        (multipathway_fusion): FuseFastToSlow(
          (conv_fast_to_slow): Conv3d(8, 16, kernel_size=(7, 1, 1), stride=(4, 1, 1), padding=(3, 0, 0), bias=False)
          # 把[1, 8, 32, 160, 285]转成fuse[1, 16, 8, 160, 285])
          (norm): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
        )
        # 将[1, 64, 8, 160, 285]和fuse拼接得到[1, 80, 8, 160, 285]
        # 将其作为x[0]与[1, 8, 32, 160, 285]再组成新的list输出
      )
      (1): MultiPathWayWithFuse(
        (multipathway_blocks): ModuleList(
          (0): ResStage(
            (res_blocks): ModuleList(
              (0): ResBlock(
                (branch1_conv): Conv3d(80, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                # [1, 80, 8, 160, 285])变成[1, 256, 8, 160, 285])
                (branch1_norm): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                '''
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(80, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                '''
                (activation): ReLU()
              )
              (1): ResBlock(
              '''
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(256, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )'''
                (activation): ReLU()
              )
              (2): ResBlock('''
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(256, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )'''
                (activation): ReLU()
              )
            )
          )
          (1): ResStage(
            (res_blocks): ModuleList(
              (0): ResBlock(
                (branch1_conv): Conv3d(8, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                (branch1_norm): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)'''
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(8, 8, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(8, 8, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(8, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )'''
                (activation): ReLU()
              )# 把[1, 8, 32, 160, 285]变成torch.Size([1, 32, 32, 160, 285])
              (1): ResBlock('''
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(32, 8, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(8, 8, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(8, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )'''
                (activation): ReLU()
              )
              (2): ResBlock('''
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(32, 8, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(8, 8, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(8, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )'''
                (activation): ReLU()
              )
            )
          )
        )
        (multipathway_fusion): FuseFastToSlow(
          (conv_fast_to_slow): Conv3d(32, 64, kernel_size=(7, 1, 1), stride=(4, 1, 1), padding=(3, 0, 0), bias=False)
          (norm): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          # 把torch.Size([1, 32, 32, 160, 285])变成torch.Size([1, 64, 8, 160, 285])
          # 与torch.Size([1, 256, 8, 160, 285])拼接成torch.Size([1, 320, 8, 160, 285])后再组成list
        )
      )
      (2): MultiPathWayWithFuse(
        (multipathway_blocks): ModuleList(
          (0): ResStage(
            (res_blocks): ModuleList(
              (0): ResBlock(
                (branch1_conv): Conv3d(320, 512, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
                (branch1_norm): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(320, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_a): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(128, 128, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(128, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (1): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(512, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_a): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(128, 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(128, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (2): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(512, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_a): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(128, 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(128, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (3): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(512, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_a): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(128, 128, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(128, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
            )
          )
          (1): ResStage(
            (res_blocks): ModuleList(
              (0): ResBlock(
                (branch1_conv): Conv3d(32, 64, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
                (branch1_norm): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(32, 16, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(16, 16, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(16, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (1): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(64, 16, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(16, 16, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(16, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (2): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(64, 16, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(16, 16, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(16, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (3): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(64, 16, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(16, 16, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(16, 64, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
            )
          )
        )
        (multipathway_fusion): FuseFastToSlow(
          (conv_fast_to_slow): Conv3d(64, 128, kernel_size=(7, 1, 1), stride=(4, 1, 1), padding=(3, 0, 0), bias=False)
          (norm): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
        )
      )
      (3): MultiPathWayWithFuse(
        (multipathway_blocks): ModuleList(
          (0): ResStage(
            (res_blocks): ModuleList(
              (0): ResBlock(
                (branch1_conv): Conv3d(640, 1024, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
                (branch1_norm): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(640, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (1): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (2): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (3): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (4): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (5): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(1024, 256, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(256, 256, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(256, 1024, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
            )
          )
          (1): ResStage(
            (res_blocks): ModuleList(
              (0): ResBlock(
                (branch1_conv): Conv3d(64, 128, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
                (branch1_norm): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(64, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (1): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (2): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (3): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (4): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (5): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(128, 32, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(32, 32, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1), bias=False)
                  (norm_b): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(32, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
            )
          )
        )
        (multipathway_fusion): FuseFastToSlow(
          (conv_fast_to_slow): Conv3d(128, 256, kernel_size=(7, 1, 1), stride=(4, 1, 1), padding=(3, 0, 0), bias=False)
          (norm): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
        )
      )
      (4): MultiPathWayWithFuse(
        (multipathway_blocks): ModuleList(
          (0): ResStage(
            (res_blocks): ModuleList(
              (0): ResBlock(
                (branch1_conv): Conv3d(1280, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                (branch1_norm): BatchNorm3d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(1280, 512, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(512, 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
                  (norm_b): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(512, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (1): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(2048, 512, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(512, 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
                  (norm_b): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(512, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (2): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(2048, 512, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(512, 512, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
                  (norm_b): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(512, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
            )
          )
          (1): ResStage(
            (res_blocks): ModuleList(
              (0): ResBlock(
                (branch1_conv): Conv3d(128, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                (branch1_norm): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(128, 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
                  (norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (1): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(256, 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
                  (norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
              (2): ResBlock(
                (branch2): BottleneckBlock(
                  (conv_a): Conv3d(256, 64, kernel_size=(3, 1, 1), stride=(1, 1, 1), padding=(1, 0, 0), bias=False)
                  (norm_a): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_a): ReLU()
                  (conv_b): Conv3d(64, 64, kernel_size=(1, 3, 3), stride=(1, 1, 1), padding=(0, 2, 2), dilation=(1, 2, 2), bias=False)
                  (norm_b): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (act_b): ReLU()
                  (conv_c): Conv3d(64, 256, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
                  (norm_c): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )
                (activation): ReLU()
              )
            )
          )
        )
        (multipathway_fusion): Identity()
      )
      # 此时x是list(torch.Size([1, 2048, 8, 40, 72]),torch.Size([1, 256, 32, 40, 72]))
      (5): PoolConcatPathway(
        (pool): ModuleList(
          (0): AvgPool3d(kernel_size=(8, 1, 1), stride=(1, 1, 1), padding=(0, 0, 0))# x[0]变成torch.Size([1, 2048, 1, 40, 72])
          (1): AvgPool3d(kernel_size=(32, 1, 1), stride=(1, 1, 1), padding=(0, 0, 0))# x[1]变成torch.Size([1, 256, 1, 40, 72])
          # 拼接起来得到torch.Size([1, 2304, 1, 40, 72])
        )
      )
    )
  )
  (detection_head): ResNetRoIHead(
    # torch.Size([1, 2304, 40, 72])和bbox[7, 5]输入
    (roi_layer): RoIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=False)
    # 得到torch.Size([7, 2304, 7, 7])
    (pool_spatial): MaxPool2d(kernel_size=(7, 7), stride=1, padding=0, dilation=1, ceil_mode=False)
    # 得到[7, 2304, 1, 1]并反压缩为torch.Size([7, 2304, 1, 1, 1])
    (dropout): Dropout(p=0.5, inplace=False)
    # permute为torch.Size([7, 1, 1, 1, 2304])
    (proj): Linear(in_features=2304, out_features=80, bias=True)
    # 再permute为torch.Size([7, 80, 1, 1, 1])
    (activation): Sigmoid()
  )
)# 再整成torch.Size([7, 80])

从而根据argmax判断这7个对象分别是什么动作(根据ava数据集的80类标签)(不过这里面好像没有区别人和其他东西,也就是物品大概也是识别了一下它们的动作,就有点多余了)
——————————————
顺便记录一下bbox的格式
tlwh: (x_top_left, y_top_left, width, height)

x_top_left: 左上角 x 坐标
y_top_left: 左上角 y 坐标
width: 宽度
height: 高度
xywh: (x, y, width, height)

x: 左上角 x 坐标
y: 左上角 y 坐标
width: 宽度
height: 高度
xyxy: (x_min, y_min, x_max, y_max)

x_min: 左上角 x 坐标
y_min: 左上角 y 坐标
x_max: 右下角 x 坐标
y_max: 右下角 y 坐标
cxcywh: (center_x, center_y, width, height)

center_x: 中心点 x 坐标
center_y: 中心点 y 坐标
width: 宽度
height: 高度

你可能感兴趣的:(YOLO)