qq_41627642

深度学习之目标检测（Swin Transformer for Object Detection）

1、MMdetection系列版本编辑

2、 MMDetection和MMCV兼容版本

3、Installation（Linux系统环境安装）

3.1 搭建基本环境

3.2 安装mmcv-full

3.3 安装其他必要的Python包

3.4 安装 MMDetection

3.5、apex安装

2、 Windows 的mmcv-full安装

windows的本地编译安装

4、Swin Transform 训练自己的数据集

4.1 准备coco数据集

4.1.1 案例1 txt转coco数据集

1、visdrone数据

1. 将annotations中的txt标签转化为voc需要的xml文件

1. VOC Annotations文件夹

2.xml2json

4.2 配置修改工程

1、设置类别数（configs/base/models/mask_rcnn_swin_fpn.py）：

2、修改配置信息（间隔和加载预训练模型configs/base/default_runtime.py）

3、修改训练尺寸大小、max_epochs按需修改（configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py）

4、配置数据集路径、img_scale、samples_per_gpu、workers_per_gpu和增加数据增强（configs/base/datasets/coco_detection.py）

configs/base/datasets/coco_instance.py 文件的最上面指定了数据集的路径，因此在项目下新建 data/coco目录，下面四个子目录 annotations和test2017，train2017，val2017。路径/configs/base/datasets/coco_detection.py，第2行的data_root数据集根目录路径，第8行的img_scale可以根据需要修改，下面train、test、val数据集的具体路径ann_file根据自己数据集修改

5、修改分类数组：mmdet/datasets/coco.py

6、可以修改学习率（mmdetection/configs/_base_/schedules/schedule_20e.py ）

这里是调整学习率的schedule的位置，可以设置warmup schedule和衰减策略。 1x, 2x分别对应12epochs和24epochs，20e对应20epochs，这里注意配置都是默认8块gpu的训练，如果用一块gpu训练，需要在lr/8

7、载入修改好的配置文件

8、使用Tensorboard进行可视化，查看训练

9、训练过程中进行模型验证

4.3 开始训练执行图下命令

编辑

上面的.log、.log.json文件就是训练的日志文件，每训练完一个epoch后目录下还会有对应的以epoch_x.pth的模型文件，最新训练的模型文件命名为latest.pth。

4.4、禁用mask

训练后的模型进行推理测试、评估

将mmdetection产生的coco(json)格式的测试结果转化成VisDrone19官网所需要的txt格式文件（还包括无标签测试图片产生mmdet所需要的test.json方法）

4、遇到的问题及解决办法

5、测试训练好的模型

1、MMdetection系列版本编辑

2、 MMDetection和MMCV兼容版本

3、Installation（Linux系统环境安装）

3.1 搭建基本环境

3.2 安装mmcv-full

3.3 安装其他必要的Python包

3.4 安装 MMDetection

3.5、apex安装

2、 Windows 的mmcv-full安装

windows的本地编译安装

4、Swin Transform 训练自己的数据集

4.1 准备coco数据集

4.1.1 案例1 txt转coco数据集

1、visdrone数据

1. 将annotations中的txt标签转化为voc需要的xml文件

1. VOC Annotations文件夹

2.xml2json

4.2 配置修改工程

1、设置类别数（configs/base/models/mask_rcnn_swin_fpn.py）：

2、修改配置信息（间隔和加载预训练模型configs/base/default_runtime.py）

3、修改训练尺寸大小、max_epochs按需修改（configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py）

4、配置数据集路径、img_scale、samples_per_gpu、workers_per_gpu和增加数据增强（configs/base/datasets/coco_detection.py）

5、修改分类数组：mmdet/datasets/coco.py

6、可以修改学习率（mmdetection/configs/_base_/schedules/schedule_20e.py ）

7、载入修改好的配置文件

8、使用Tensorboard进行可视化，查看训练

9、训练过程中进行模型验证

4.3 开始训练执行图下命令

4.4、禁用mask

4、遇到的问题及解决办法

5、测试训练好的模型

1、MMdetection系列版本

2、 MMDetection和MMCV兼容版本

3、Installation（Linux系统环境安装）

3.1 搭建基本环境

3.2 安装mmcv-full

1.3 安装其他必要的Python包

1.3 安装 MMDetection

2、 Windows 的mmcv-full安装

windows的本地编译安装

4、Swin Transform 训练自己的数据集

4.1 准备coco数据集

4.2 配置修改工程

1、设置类别数（configs/base/models/mask_rcnn_swin_fpn.py）：

2、修改配置信息（间隔和加载预训练模型configs/base/default_runtime.py）

3、修改训练尺寸大小、max_epochs按需修改（configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py）

4、配置数据集路径、img_scale、samples_per_gpu、workers_per_gpu和增加数据增强（configs/base/datasets/coco_detection.py）

5、修改分类数组：mmdet/datasets/coco.py

6、可以修改学习率（mmdetection/configs/_base_/schedules/schedule_20e.py ）

7、载入修改好的配置文件

8、使用Tensorboard进行可视化

4.3 开始训练执行图下命令

4.4、禁用mask

4、遇到的问题及解决办法

5、测试训练好的模型

1、MMdetection系列版本

2、 MMDetection和MMCV兼容版本

MMDetection version	MMCV version
master	mmcv-full>=1.3.17, <1.6.0
2.24.1	mmcv-full>=1.3.17, <1.6.0
2.24.0	mmcv-full>=1.3.17, <1.6.0
2.23.0	mmcv-full>=1.3.17, <1.5.0
2.22.0	mmcv-full>=1.3.17, <1.5.0
2.21.0	mmcv-full>=1.3.17, <1.5.0
2.20.0	mmcv-full>=1.3.17, <1.5.0
2.19.1	mmcv-full>=1.3.17, <1.5.0
2.19.0	mmcv-full>=1.3.17, <1.5.0
2.18.0	mmcv-full>=1.3.17, <1.4.0
2.17.0	mmcv-full>=1.3.14, <1.4.0
2.16.0	mmcv-full>=1.3.8, <1.4.0
2.15.1	mmcv-full>=1.3.8, <1.4.0
2.15.0	mmcv-full>=1.3.8, <1.4.0
2.14.0	mmcv-full>=1.3.8, <1.4.0
2.13.0	mmcv-full>=1.3.3, <1.4.0
2.12.0	mmcv-full>=1.3.3, <1.4.0
2.11.0	mmcv-full>=1.2.4, <1.4.0
2.10.0	mmcv-full>=1.2.4, <1.4.0
2.9.0	mmcv-full>=1.2.4, <1.4.0
2.8.0	mmcv-full>=1.2.4, <1.4.0
2.7.0	mmcv-full>=1.1.5, <1.4.0
2.6.0	mmcv-full>=1.1.5, <1.4.0
2.5.0	mmcv-full>=1.1.5, <1.4.0
2.4.0	mmcv-full>=1.1.1, <1.4.0
2.3.0	mmcv-full==1.0.5
2.3.0rc0	mmcv-full>=1.0.2
2.2.1	mmcv==0.6.2
2.2.0	mmcv==0.6.2
2.1.0	mmcv>=0.5.9, <=0.6.1
2.0.0	mmcv>=0.5.1, <=0.5.8

3、Installation（Linux系统环境安装）

Windows10系统下swin-transformer目标检测环境搭建

3.1 搭建基本环境

cuda与pytorch版本

conda create -n mmdetection python=3.7 -y   #创建环境
conda activate mmdetection                  #激活环境
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch  #安装 PyTorch and torchvision 

#或者这样安装
pip3 install torch==1.8.2+cu102 torchvision==0.9.2+cu102 torchaudio===0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html  -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com

验证是否安装成功

>>> import torchvision
>>> import torch
>>> import.__version__
  File "", line 1
    import.__version__
          ^
SyntaxError: invalid syntax
>>> torch.__version__
'1.8.2+cu102'

3.2 安装mmcv-full

CUDA	torch 1.11	torch 1.10	torch 1.9	torch 1.8	torch 1.7	torch 1.6	torch 1.5
11.5	install
11.3	install	install
11.1		install	install	install
11.0					install
10.2	install	install	install	install	install	install	install
10.1				install	install	install	install
9.2					install	install	install
cpu	install	install	install	install	install	install	install

#Install mmcv-full. 安装mmcv-full
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html


Please replace {cu_version} and {torch_version} in the url to your desired one. For example, to install the latest mmcv-full with CUDA 11.0 and PyTorch 1.7.0, use the following command:

pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html

pip install mmcv-full==1.3.9 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html #明确mmcv-full的版本号


pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html


pip install mmcv-full==1.3.17 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com

验证是否安装成功
import mmcv


如果出现
>>> import mmcv
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-10.2'

我们去看看驱动：
nvidia-smi

如果返回NVIDIA驱动失效简单解决方案：NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver.

这种情况是由于重启服务器，linux内核升级导致的，由于linux内核升级，之前的Nvidia驱动就不匹配连接了，但是此时Nvidia驱动还在，可以通过命令 nvcc -V 找到答案。



解决方法：

查看已安装驱动的版本信息
ls /usr/src | grep nvidia
(mmdetection) lhy@thales-Super-Server:~$ ls /usr/src | grep nvidia
nvidia-440.33.01


进行下列操作
sudo apt-get install dkms
sudo dkms install -m nvidia -v 440.33.01


然后进行验证：
(mmdetection) lhy@thales-Super-Server:~$ nvidia-smi
Fri May  6 00:56:02 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN RTX           Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   47C    P0    54W / 280W |      0MiB / 24220MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN RTX           Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   47C    P0    65W / 280W |      0MiB / 24220MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN RTX           Off  | 00000000:82:00.0 Off |                  N/A |
|  0%   48C    P0    63W / 280W |      0MiB / 24220MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN RTX           Off  | 00000000:83:00.0 Off |                  N/A |
|  0%   46C    P0    42W / 280W |      0MiB / 24220MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(mmdetection) lhy@thales-Super-Server:~$ python
Python 3.7.13 (default, Mar 29 2022, 02:18:16) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mmcv

https://download.openmmlab.com/mmcv/dist/cu102/torch1.6.0/index.html

根据这个网址可以查看torch1.6.0支持的mmcv-full的版本

注意:上面提供的预构建包不包括所有版本的mmcv-full，您可以单击相应的链接来查看支持的版本。例如，您可以单击cu102-torch1.8.0，可以看到cu102-torch1.8.0只提供1.3.0及以上版本的mmcv-full。此外，从v1.3.17开始，我们不再提供使用PyTorch 1.3和1.4编译的完整的mmcv预构建包。你可以在这里找到用PyTorch 1.3和1.4编译的以前版本。在我们的Cl中，兼容性仍然得到保证，但我们将在明年放弃对PyTorch 1.3和1.4的支持。

3.3 安装其他必要的Python包

pip install cython matplotlib opencv-python timm -i [http://mirrors.aliyun.com/pypi/simple/]

3.4 安装 MMDetection

# These must be installed before building mmdetection
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com 

pip install cython matplotlib opencv-python 
cython
numpy
matplotlib


You can simply install mmdetection with the following command:
你可以使用下面的命令简单地安装mmdetection:

pip install mmdet

或者克隆存储库然后安装:
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -r requirements/build.txt
pip install -v -e .  # or "python setup.py develop"

安装完成
Using /home/lhy/anaconda3/envs/mmdetection/lib/python3.7/site-packages
Finished processing dependencies for mmdet==2.24.1


a.当指定-e或develop时，MMDetection被安装在dev模式下，对代码所做的任何本地修改都将生效，无需重新安装
b.如果你想使用opencv-python-headless而不是opencv-python，你可以在安装MMCV之前安装它。


安装额外依赖Instaboost, Panoptic Segmentation, LVIS数据集，或Albumentations。

# for instaboost
pip install instaboostfast
# for panoptic segmentation
pip install git+https://github.com/cocodataset/panopticapi.git
# for LVIS dataset
pip install git+https://github.com/lvis-dataset/lvis-api.git
# for albumentations
pip install -r requirements/albu.txt


d.如果你想使用albumentations，我们建议使用pip install -r requirements/ albumentations或pip install -U albumentations——nobinary qudida, albumentations。如果您简单地使用pip install albumentations>=0.3.2，它将同时安装opencv-python-headless(即使您已经安装了opencv-python)。我们建议在安装albumentation的产品后检查环境，以确保opencv-python和opencv-python-headless没有被同时安装，因为如果同时安装可能会导致意想不到的问题。请参阅官方文件了解更多细节。

3.5、apex安装

git clone https://github.com/NVIDIA/apex

进入 apex 文件夹
执行：python setup.py install

git clone https://github.com/NVIDIA/apex
cd apex
python3 setup.py install

pip list 能看见 apex （0.1版本，只有这一个版本）
注：安装的apex会在训练模型时候有一个警告内容如下：（但实际没啥影响）
fused_weight_gradient_mlp_cuda module not found. gradient accumulation fusion with weight gradient computation disabled.

2、 Windows 的mmcv-full安装

1、打开Anaconda Powershell prompt命令窗口

conda create -n swim python=3.8 -y

# These must be installed before building mmdetection
pip install cython matplotlib opencv-python 
cython
numpy
matplotlib

通过查询可知按照官方的方式只能安装2.7.0版本的mmdetection

windows的本地编译安装

目标检测学习笔记——mmdet的mmcv安装

准备 MMCV 源代码

+https://github.com/open-mmlab/mmcv/graphs/contributors

git clone https://github.com/open-mmlab/mmcv.git
git checkout v1.2.0 # based on target version
cd mmcv

安装所需 Python 依赖包

置 MSVC 编译器
训练自己的数据集

4、Swin Transform 训练自己的数据集

Swin-Transformer

Swin-Transformer-Object-Detection

Swin Transformer Object Detection 目标检测-2——训练自己的数据集

Swin-transformer纯目标检测训练自己的数据集

4.1 准备coco数据集

4.1.1 案例1 txt转coco数据集

将visdrone数据集转化为coco格式并在mmdetection上训练,附上转好的json文件

1、visdrone数据

配备摄像头的无人机(或通用无人机)已被快速部署到广泛的应用领域，包括农业、航空摄影、快速交付和监视。因此，从这些平台上收集的视觉数据的自动理解要求越来越高，这使得计算机视觉与无人机的关系越来越密切。我们很高兴为各种重要的计算机视觉任务展示一个大型基准，并仔细注释了地面真相，命名为VisDrone，使视觉与无人机相遇。VisDrone2019数据集由天津大学机器学习和数据挖掘实验室AISKYEYE团队收集。基准数据集包括288个视频片段，由261908帧和10209幅静态图像组成，由各种无人机摄像头捕获，覆盖范围广泛，包括位置(来自中国相隔数千公里的14个不同城市)、环境(城市和农村)、物体(行人、车辆、自行车、等)和密度(稀疏和拥挤的场景)。请注意，数据集是在不同的场景、不同的天气和光照条件下使用不同的无人机平台(即不同型号的无人机)收集的。这些框架用超过260万个经常感兴趣的目标框手工标注，比如行人、汽车、自行车和三轮车。一些重要的属性，包括场景可见性，对象类和遮挡，也提供了更好的数据利用。

isdrone是一个无人机的目标检测数据集，在很多目标检测的论文中都能看到它的身影。
标签从0到11分别为’ignored regions’,‘pedestrian’,‘people’,‘bicycle’,‘car’,‘van’,
‘truck’,‘tricycle’,‘awning-tricycle’,‘bus’,‘motor’,‘others’

数据下载

下载完成后我们可以在Anootations看到txt文件，其中相关的数字解释如下：

,,,,,,,

Name Description
-------------------------------------------------------------------------------------------------------------------------------
    The x coordinate of the top-left corner of the predicted bounding box

    The y coordinate of the top-left corner of the predicted object bounding box

    The width in pixels of the predicted object bounding box

   The height in pixels of the predicted object bounding box

    The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing
an object instance.
The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation,
while 0 indicates the bounding box will be ignored.

The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1),
people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10),
others(11))

   The score in the DETECTION result file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame
(i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).

   The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0
(occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2
(occlusion ratio 50% ~ 100%)).

其中：两种有用的注释：truncation截断率,occlusion遮挡率。

被遮挡的对象比例来定义遮挡率。

截断率用于指示对象部分出现在框架外部的程度。

如果目标的截断率大于50％，则会在评估过程中将其跳过。

现在先要用mmdetection自己训练一下这个数据集，需要把他转化为coco数据集格式

分两步走：

1. 将annotations中的txt标签转化为voc需要的xml文件

1. VOC Annotations文件夹

该文件下存放的是xml格式的标签文件，每个xml文件都对应于JPEGImages文件夹的一张图片, 其中对xml的解析如下:


  
    VOC2007                             
    2007_000392.jpg                               //文件名  
                                                               //图像来源（不重要）  
        The VOC2007 Database  
        PASCAL VOC2007  
        flickr  
      
                                                   //图像尺寸（长宽以及通道数）                        
        500  
        332  
        3  
      
    1                                   //是否用于分割（在图像物体识别中01无所谓）

下面是visDrone2019的txt注释文件转换为voc xml的代码，visDrone2019_txt2xml_voc.py

需要改的地方有注释，就是几个路径改一下即可

'''
Author: 刘鸿燕 [email protected]
Date: 2022-05-09 10:17:40
LastEditors: 刘鸿燕 [email protected]
LastEditTime: 2022-05-09 11:17:20
FilePath: \VisDrone2019\data_process\visDrone2019_txt2xml.py
Description: 这是默认设置,请设置`customMade`, 打开koroFileHeader查看配置 进行设置: https://github.com/OBKoro1/koro1FileHeader/wiki/%E9%85%8D%E7%BD%AE
'''
import os
import datetime
from PIL import Image
from pathlib import Path
FILE = Path(__file__).resolve()
# print("FILE",FILE)
ROOT = FILE.parent.parents[0] # root directory
print("ROOT",ROOT)

def check_dir(path):
    if os.path.isdir(path):
        print("{}文件路径存在！".format(path))
        pass
    else:
        os.makedirs(path)
        print("{}文件路径创建成功！".format(path))

#把下面的root_dir路径改成你自己的路径即可
root_dir = ROOT / 'VisDrone2019-DET-train'
annotations_dir = root_dir / "annotations/" 
image_dir = root_dir / "images/"
xml_dir = root_dir / "Annotations_XML/"   #在工作目录下创建Annotations_XML文件夹保存xml文件
check_dir(xml_dir)
# print("annotation_dir",annotations_dir)
# print("image_dir",image_dir)
# print("xml_dir",xml_dir)

# root_dir = r"D:\object_detection_data\datacovert\VisDrone2019-DET-val/"   
# annotations_dir = root_dir+"annotations/"
# image_dir = root_dir + "images/"
# xml_dir = root_dir+"Annotations_XML/"   #在工作目录下创建Annotations_XML文件夹保存xml文件

# 下面的类别也换成你自己数据类别，也可适用于其他的数据集转换
class_name = ['ignored regions','pedestrian','people','bicycle','car','van',
    'truck','tricycle','awning-tricycle','bus','motor','others']

for filename in os.listdir(annotations_dir):
    fin = open(annotations_dir/ filename, 'r')
    image_name = filename.split('.')[0]
    image_path=Path(image_dir).joinpath(image_name+".jpg")# 若图像数据是“png”转换成“.png”即可
    img = Image.open(image_path) # 若图像数据是“png”转换成“.png”即可
    xml_name = Path(xml_dir).joinpath(image_name+'.xml')
    with open(xml_name, 'w') as fout:
        #写入的xml基本信息
        fout.write(''+'\n')
        fout.write('\t'+'VOC2007'+'\n')
        fout.write('\t'+''+image_name+'.jpg'+''+'\n')
        
        fout.write('\t'+''+'\n')
        fout.write('\t\t'+''+'VisDrone2019-DET'+''+'\n')
        fout.write('\t\t'+''+'VisDrone2019-DET'+''+'\n')
        fout.write('\t\t'+''+'flickr'+''+'\n')
        fout.write('\t\t'+''+'Unspecified'+''+'\n')
        fout.write('\t'+''+'\n')
        
        fout.write('\t'+''+'\n')
        fout.write('\t\t'+''+'LJ'+''+'\n')
        fout.write('\t\t'+''+'LJ'+''+'\n')
        fout.write('\t'+''+'\n')
        
        fout.write('\t'+''+'\n')
        fout.write('\t\t'+''+str(img.size[0])+''+'\n')
        fout.write('\t\t'+''+str(img.size[1])+''+'\n')
        fout.write('\t\t'+''+'3'+''+'\n')
        fout.write('\t'+''+'\n')
        
        fout.write('\t'+''+'0'+''+'\n')

        for line in fin.readlines():
            line = line.split(',')
            fout.write('\t'+''+'\n')
             
        fin.close()
        fout.write('')

2.xml2json

#!/usr/bin/python
# xml是voc的格式
# json是coco的格式
import sys, os, json, glob
import xml.etree.ElementTree as ET

INITIAL_BBOXIds = 1
# PREDEF_CLASSE = {}
PREDEF_CLASSE = { 'pedestrian': 1, 'people': 2,
    'bicycle': 3, 'car': 4, 'van': 5, 'truck': 6, 'tricycle': 7,
    'awning-tricycle': 8, 'bus': 9, 'motor': 10}
    #我这里只想检测这十个类， 0和11没有加入转化。

# function
def get(root, name):
    return root.findall(name)

def get_and_check(root, name, length):
    vars = root.findall(name)
    if len(vars) == 0:
        raise NotImplementedError('Can not find %s in %s.'%(name, root.tag))
    if length > 0 and len(vars) != length:
        raise NotImplementedError('The size of %s is supposed to be %d, but is %d.'%(name, length, len(vars)))
    if length == 1:
        vars = vars[0]
    return vars

def convert(xml_paths, out_json):
    json_dict = {'images': [], 'type': 'instances', 
        'categories': [], 'annotations': []}
    categories = PREDEF_CLASSE
    bbox_id = INITIAL_BBOXIds
    for image_id, xml_f in enumerate(xml_paths):

        # 进度输出
        sys.stdout.write('\r>> Converting image %d/%d' % (
            image_id + 1, len(xml_paths)))
        sys.stdout.flush()

        tree = ET.parse(xml_f)
        root = tree.getroot()
        filename = get_and_check(root, 'filename', 1).text
        size = get_and_check(root, 'size', 1)
        width = int(get_and_check(size, 'width', 1).text)
        height = int(get_and_check(size, 'height', 1).text)
        image = {'file_name': filename, 'height': height, 
                'width': width, 'id': image_id + 1}
        json_dict['images'].append(image)
        ## Cruuently we do not support segmentation
        #segmented = get_and_check(root, 'segmented', 1).text
        #assert segmented == '0'

        for obj in get(root, 'object'):
            category = get_and_check(obj, 'name', 1).text
            if category not in categories:
                new_id = max(categories.values()) + 1
                categories[category] = new_id
            category_id = categories[category]
            bbox = get_and_check(obj, 'bndbox', 1)
            xmin = int(get_and_check(bbox, 'xmin', 1).text) - 1
            ymin = int(get_and_check(bbox, 'ymin', 1).text) - 1
            xmax = int(get_and_check(bbox, 'xmax', 1).text)
            ymax = int(get_and_check(bbox, 'ymax', 1).text)
            if xmax <= xmin or ymax <= ymin:
                continue
            o_width = abs(xmax - xmin)
            o_height = abs(ymax - ymin)
            ann = {'area': o_width * o_height, 'iscrowd': 0, 'image_id': image_id + 1,
                'bbox': [xmin, ymin, o_width, o_height], 'category_id': category_id, 
                'id': bbox_id, 'ignore': 0, 'segmentation': []}
            json_dict['annotations'].append(ann)
            bbox_id = bbox_id + 1

    for cate, cid in categories.items():
        cat = {'supercategory': 'none', 'id': cid, 'name': cate}
        json_dict['categories'].append(cat)
        
    # json_file = open(out_json, 'w')
    # json_str = json.dumps(json_dict)
    # json_file.write(json_str)
    # json_file.close() # 快
    json.dump(json_dict, open(out_json, 'w'), indent=4)  # indent=4 更加美观显示 慢

if __name__ == '__main__':
    xml_path = r'D:\object_detection_data\datacovert\VisDrone2019-DET-val/Annotations_XML/'   #改一下读取xml文件位置
    xml_file = glob.glob(os.path.join(xml_path, '*.xml'))
    convert(xml_file, r'D:\object_detection_data\datacovert\VisDrone2019-DET-val/NEW_val.json')  #这里是生成的json保存位置，改一下

4.2 配置修改工程

1、设置类别数（configs/base/models/mask_rcnn_swin_fpn.py）：

修改 configs/base/models/mask_rcnn_swin_fpn.py 中 num_classes 为自己数据集的类别（有两处需要修改）。两处大概在第54行和73行，修改为自己数据集的类别数量，示例如下。

# model settings
model = dict(
    type='MaskRCNN',
    pretrained=None,
    backbone=dict(
        type='SwinTransformer',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4.,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.2,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    neck=dict(
        type='FPN',
        in_channels=[96, 192, 384, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=80,  #修改为自己的类别，注意这里不需要加BG类（+1）
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0., 0., 0., 0.],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=80,  #修改为自己的类别，注意这里不需要加BG类（+1）
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    # model training and testing settings
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            mask_size=28,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))

2、修改配置信息（间隔和加载预训练模型configs/base/default_runtime.py）

修改 configs/base/default_runtime.py 中的 interval，loadfrom

interval：dict(interval=1) # 表示多少个 epoch 验证一次，然后保存一次权重信息，

第1行interval=1表示每1个epoch保存一次权重信息，表示多少个 epoch 验证一次，然后保存一次权重信息，
第4行interval=50表示每50次打印一次日志信息
loadfrom：表示加载哪一个训练好的权重，可以直接写绝对路径如： load_from = r"E:\workspace\Python\Pytorch\Swin-Transformer-Object-Detection\mask_rcnn_swin_tiny_patch4_window7.pth"

3、修改训练尺寸大小、max_epochs按需修改（configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py）

如果显存够的话可以不改（基本都运行不起来），文件位置为：configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py
修改所有的 img_scale 为：img_scale = [(224, 224)] 或者 img_scale = [(256, 256)] 或者 480，512等。
同时 configs/base/datasets/coco_instance.py 或者configs/base/datasets/coco_detection.py中的 img_scale 也要改成 img_scale = [(224, 224)] 或者其他值

第3行’…/base/datasets/coco_instance.py’修改为’…/base/datasets/coco_detection.py’

第69行的max_epochs按需修改

_base_ = [
    '../_base_/models/mask_rcnn_swin_fpn.py',
    '../_base_/datasets/coco_instance.py', #做目标检测，修改为coco_detection.py
    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]

model = dict(
    backbone=dict(
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        ape=False,
        drop_path_rate=0.2,
        patch_norm=True,
        use_checkpoint=False
    ),
    neck=dict(in_channels=[96, 192, 384, 768]))

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

# augmentation strategy originates from DETR / Sparse RCNN
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='AutoAugment',
         policies=[
             [
                 dict(type='Resize',
                      #这里可以根据自己的硬件设置进行修改
                      img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                                 (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                                 (736, 1333), (768, 1333), (800, 1333)],
                      multiscale_mode='value',
                      keep_ratio=True)
             ],
             [
                 dict(type='Resize',
                      img_scale=[(400, 1333), (500, 1333), (600, 1333)],
                      multiscale_mode='value',
                      keep_ratio=True),
                 dict(type='RandomCrop',
                      crop_type='absolute_range',
                      crop_size=(384, 600),
                      allow_negative_crop=True),
                 dict(type='Resize',
                      #这里可以根据自己的硬件设置进行修改
                      img_scale=[(480, 1333), (512, 1333), (544, 1333),
                                 (576, 1333), (608, 1333), (640, 1333),
                                 (672, 1333), (704, 1333), (736, 1333),
                                 (768, 1333), (800, 1333)],
                      multiscale_mode='value',
                      override=True,
                      keep_ratio=True)
             ]
         ]),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
data = dict(train=dict(pipeline=train_pipeline))

optimizer = dict(_delete_=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05,
                 paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
                                                 'relative_position_bias_table': dict(decay_mult=0.),
                                                 'norm': dict(decay_mult=0.)}))
lr_config = dict(step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36) #训练的epoch可以根据需要修改

# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

4、配置数据集路径、img_scale、samples_per_gpu、workers_per_gpu和增加数据增强（configs/base/datasets/coco_detection.py）

configs/base/datasets/coco_instance.py 文件的最上面指定了数据集的路径，因此在项目下新建 data/coco目录，下面四个子目录 annotations和test2017，train2017，val2017。路径/configs/base/datasets/coco_detection.py，第2行的data_root数据集根目录路径，第8行的img_scale可以根据需要修改，下面train、test、val数据集的具体路径ann_file根据自己数据集修改

第31行的samples_per_gpu表示batch size大小，太大会内存溢出
第32行的workers_per_gpu表示每个GPU对应线程数，2、4、6、8按需修改
修改 batch size 和线程数：根据自己的显存和CPU来设置

dataset_type = 'CocoDataset'
data_root = 'data/coco/' #数据的根目录
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), #img_scale修改
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800), #img_scale修改
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2, #batch size大小
    workers_per_gpu=2, #每个GPU对应线程数
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline))
evaluation = dict(metric=['bbox', 'segm'])

configs/_base_/datasets/coco_detection.py 在train pipeline修改Data Augmentation在train

dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
# 在这里加albumentation的aug
albu_train_transforms = [
    dict(
        type='ShiftScaleRotate',
        shift_limit=0.0625,
        scale_limit=0.0,
        rotate_limit=0,
        interpolation=1,
        p=0.5),
    dict(
        type='RandomBrightnessContrast',
        brightness_limit=[0.1, 0.3],
        contrast_limit=[0.1, 0.3],
        p=0.2),
    dict(
        type='OneOf',
        transforms=[
            dict(
                type='RGBShift',
                r_shift_limit=10,
                g_shift_limit=10,
                b_shift_limit=10,
                p=1.0),
            dict(
                type='HueSaturationValue',
                hue_shift_limit=20,
                sat_shift_limit=30,
                val_shift_limit=20,
                p=1.0)
        ],
        p=0.1),
    dict(type='JpegCompression', quality_lower=85, quality_upper=95, p=0.2),
    dict(type='ChannelShuffle', p=0.1),
    dict(
        type='OneOf',
        transforms=[
            dict(type='Blur', blur_limit=3, p=1.0),
            dict(type='MedianBlur', blur_limit=3, p=1.0)
        ],
        p=0.1),
]
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    #据说这里改img_scale即可多尺度训练，但是实际运行报错。
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='Pad', size_divisor=32),
    dict(
        type='Albu',
        transforms=albu_train_transforms,
        bbox_params=dict(
            type='BboxParams',
            format='pascal_voc',
            label_fields=['gt_labels'],
            min_visibility=0.0,
            filter_lost_elements=True),
        keymap={
            'img': 'image',
            'gt_masks': 'masks',
            'gt_bboxes': 'bboxes'
        },
]
# 测试的pipeline
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        # 多尺度测试 TTA在这里修改，注意有些模型不支持多尺度TTA，比如cascade_mask_rcnn，若不支持会提示
        # Unimplemented Error
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
# 包含batch_size, workers和路径。
# 路径如果按照上面的设置好就不需要更改
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox')

目录

1、MMdetection系列版本

2、 MMDetection和MMCV兼容版本

3、Installation（Linux系统环境安装）

3.1 搭建基本环境

3.2 安装mmcv-full

1.3 安装其他必要的Python包

1.3 安装 MMDetection

2、 Windows 的mmcv-full安装

windows的本地编译安装

4、Swin Transform 训练自己的数据集

4.1 准备coco数据集

4.2 配置修改工程

4.3 开始训练执行图下命令

4.4、禁用mask

4、遇到的问题及解决办法

5、测试训练好的模型

5、修改分类数组：mmdet/datasets/coco.py



one：路径/mmdet/datasets/coco.py的第23行CLASSES
two：路径/mmdet/core/evaluation/class_names.py的第67行coco_classes
修改为自己数据集的类别


CLASSES中填写自己的分类：CLASSES = ('person', 'bicycle', 'car')


one:
@DATASETS.register_module()
class CocoDataset(CustomDataset):

    CLASSES = ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
               'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
               'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
               'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
               'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
               'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
               'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
               'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
               'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
               'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
               'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
               'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
               'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
               'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush')#修改为自己的类别数

#two

def coco_classes():
    return [
        'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
        'truck', 'boat', 'traffic_light', 'fire_hydrant', 'stop_sign',
        'parking_meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep',
        'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',
        'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard',
        'sports_ball', 'kite', 'baseball_bat', 'baseball_glove', 'skateboard',
        'surfboard', 'tennis_racket', 'bottle', 'wine_glass', 'cup', 'fork',
        'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
        'broccoli', 'carrot', 'hot_dog', 'pizza', 'donut', 'cake', 'chair',
        'couch', 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv',
        'laptop', 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
        'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
        'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'
    ] #修改为自己的数据集名称

6、可以修改学习率（mmdetection/configs/_base_/schedules/schedule_20e.py ）

这里是调整学习率的schedule的位置，可以设置warmup schedule和衰减策略。 1x, 2x分别对应12epochs和24epochs，20e对应20epochs，这里注意配置都是默认8块gpu的训练，如果用一块gpu训练，需要在lr/8
```
# optimizer
optimizer = dict(type='SGD', lr=0.02/8, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[16, 19])
total_epochs = 20
```
目录

1、MMdetection系列版本

2、 MMDetection和MMCV兼容版本

3、Installation（Linux系统环境安装）

3.1 搭建基本环境

3.2 安装mmcv-full

1.3 安装其他必要的Python包

1.3 安装 MMDetection

2、 Windows 的mmcv-full安装

windows的本地编译安装

4、Swin Transform 训练自己的数据集

4.1 准备coco数据集

4.2 配置修改工程

1、设置类别数（configs/base/models/mask_rcnn_swin_fpn.py）：

2、修改配置信息（间隔和加载预训练模型configs/base/default_runtime.py）

3、修改训练尺寸大小、max_epochs按需修改（configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py）

4、配置数据集路径、img_scale、samples_per_gpu、workers_per_gpu和增加数据增强（configs/base/datasets/coco_detection.py）

configs/base/datasets/coco_instance.py 文件的最上面指定了数据集的路径，因此在项目下新建 data/coco目录，下面四个子目录 annotations和test2017，train2017，val2017。路径/configs/base/datasets/coco_detection.py，第2行的data_root数据集根目录路径，第8行的img_scale可以根据需要修改，下面train、test、val数据集的具体路径ann_file根据自己数据集修改

5、修改分类数组：mmdet/datasets/coco.py

6、可以修改学习率（mmdetection/configs/_base_/schedules/schedule_20e.py ）

这里是调整学习率的schedule的位置，可以设置warmup schedule和衰减策略。 1x, 2x分别对应12epochs和24epochs，20e对应20epochs，这里注意配置都是默认8块gpu的训练，如果用一块gpu训练，需要在lr/8

4.3 开始训练执行图下命令

4.4、禁用mask

4、遇到的问题及解决办法

5、测试训练好的模型

7、载入修改好的配置文件

from mmcv import Config
import albumentations as albu
cfg = Config.fromfile('./configs/dcn/cascade_rcnn_r101_fpn_dconv_c3-c5_20e_coco.py')

可以使用以下的命令检查几个重要参数：
cfg.data.train
cfg.total_epochs
cfg.data.samples_per_gpu
cfg.resume_from
cfg.load_from
cfg.data
...

改变config中某些参数
from mmdet.apis import set_random_seed

# Modify dataset type and path

# cfg.dataset_type = 'Xray'
# cfg.data_root = 'Xray'

cfg.data.samples_per_gpu = 4
cfg.data.workers_per_gpu = 4

# cfg.data.test.type = 'Xray'
cfg.data.test.data_root = '../mmdetection_torch_1.5'
# cfg.data.test.img_prefix = '../mmdetection_torch_1.5'

# cfg.data.train.type = 'Xray'
cfg.data.train.data_root = '../mmdetection_torch_1.5'
# cfg.data.train.ann_file = 'instances_train2014.json'
# # cfg.data.train.classes = classes
# cfg.data.train.img_prefix = '../mmdetection_torch_1.5'

# cfg.data.val.type = 'Xray'
cfg.data.val.data_root = '../mmdetection_torch_1.5'
# cfg.data.val.ann_file = 'instances_val2014.json'
# # cfg.data.train.classes = classes
# cfg.data.val.img_prefix = '../mmdetection_torch_1.5'

# modify neck classes number
# cfg.model.neck.num_outs
# modify num classes of the model in box head
# for i in range(len(cfg.model.roi_head.bbox_head)):
#     cfg.model.roi_head.bbox_head[i].num_classes = 10


# cfg.data.train.pipeline[2].img_scale = (1333,800)

cfg.load_from = '../mmdetection_torch_1.5/coco_exps/latest.pth'
# cfg.resume_from = './coco_exps_v3/latest.pth'

# Set up working dir to save files and logs.
cfg.work_dir = './coco_exps_v4'

# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.optimizer.lr = 0.02 / 8
# cfg.lr_config.warmup = None
# cfg.lr_config = dict(
#     policy='step',
#     warmup='linear',
#     warmup_iters=500,
#     warmup_ratio=0.001,
#     # [7] yields higher performance than [6]
#     step=[7])
# cfg.lr_config = dict(
#     policy='step',
#     warmup='linear',
#     warmup_iters=500,
#     warmup_ratio=0.001,
#     step=[36,39])
cfg.log_config.interval = 10

# # Change the evaluation metric since we use customized dataset.
# cfg.evaluation.metric = 'mAP'
# # We can set the evaluation interval to reduce the evaluation times
# cfg.evaluation.interval = 12
# # We can set the checkpoint saving interval to reduce the storage cost
# cfg.checkpoint_config.interval = 12

# # Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)
# cfg.total_epochs = 40

# # We can initialize the logger for training and have a look
# # at the final config used for training
print(f'Config:\n{cfg.pretty_text}')

给定一个在COCO数据集上训练Faster R-CNN的配置，我们需要修改一些值来使用它在KITTI数据集上训练Faster R-CNN。

from mmdet.apis import set_random_seed

# Modify dataset type and path
cfg.dataset_type = 'KittiTinyDataset'
cfg.data_root = 'kitti_tiny/'

cfg.data.test.type = 'KittiTinyDataset'
cfg.data.test.data_root = 'kitti_tiny/'
cfg.data.test.ann_file = 'train.txt'
cfg.data.test.img_prefix = 'training/image_2'

cfg.data.train.type = 'KittiTinyDataset'
cfg.data.train.data_root = 'kitti_tiny/'
cfg.data.train.ann_file = 'train.txt'
cfg.data.train.img_prefix = 'training/image_2'

cfg.data.val.type = 'KittiTinyDataset'
cfg.data.val.data_root = 'kitti_tiny/'
cfg.data.val.ann_file = 'val.txt'
cfg.data.val.img_prefix = 'training/image_2'

# modify num classes of the model in box head
cfg.model.roi_head.bbox_head.num_classes = 3
# We can still use the pre-trained Mask RCNN model though we do not need to
# use the mask branch
cfg.load_from = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'

# Set up working dir to save files and logs.
cfg.work_dir = './tutorial_exps'

# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.optimizer.lr = 0.02 / 8
cfg.lr_config.warmup = None
cfg.log_config.interval = 10

# Change the evaluation metric since we use customized dataset.
cfg.evaluation.metric = 'mAP'
# We can set the evaluation interval to reduce the evaluation times
cfg.evaluation.interval = 12
# We can set the checkpoint saving interval to reduce the storage cost
cfg.checkpoint_config.interval = 12

# Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)


# We can initialize the logger for training and have a look
# at the final config used for training
print(f'Config:\n{cfg.pretty_text}')

训练一个新的探测器
最后，初始化数据集和检测器，然后训练一个新的检测器!

from mmdet.datasets import build_dataset
from mmdet.models import build_detector
from mmdet.apis import train_detector


# Build dataset
datasets = [build_dataset(cfg.data.train)]

# Build the detector
model = build_detector(
    cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
# Add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_detector(model, datasets, cfg, distributed=False, validate=True)

8、使用Tensorboard进行可视化，查看训练
参考博客1
如果有在default_runtime中解除注释tensorboard，键入下面的命令可以开启实时更新的tensorboard可视化模块

在config文件修改如下

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook') #生成Tensorboard 日志
    ])


设置之后，会在work_dir目录下生成一个tf_logs目录,使用Tensorboard打开日志

    cd /path/to/tf_logs
    tensorboard --logdir . --host 服务器IP地址 --port 6006

tensorboard 默认端口号是6006，在浏览器中输入http://:6006即可打开tensorboard界面

# Load the TensorBoard notebook extension
%load_ext tensorboard
# logdir需要填入你的work_dir/+tf_logs
%tensorboard --logdir=coco_exps_v4/tf_logs

9、训练过程中进行模型验证

在config文件中设置val数据集和评估指标

# coco支持的metric有['bbox', 'segm', 'proposal', 'proposal_fast']
# voc数据集支持的 metric 有['mAP', 'recall']
evaluation = dict(interval=1, metric=['mAP']) # len(metric) == 1, metric 只能设置一个指标
————————————————

4.3 开始训练执行图下命令

python tools/train.py configs\swin\mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py

实际命令根据自己使用的修改，可以看到已经可以训练了，但是这样还是训练的带mask的，还不是真正意义上的目标检测模型。

上面的.log、.log.json文件就是训练的日志文件，每训练完一个epoch后目录下还会有对应的以epoch_x.pth的模型文件，最新训练的模型文件命名为latest.pth。
上面的文件内容大同小异，有当前时间、epoch次数，迭代次数（配置文件中默认设置50个batch输出一次log信息），学习率、损失函数loss、准确率等信息，可以根据上面的训练信息进行模型的评估与测试，另外可以通过读取.log.json文件进行可视化展示，方便调试。
4.4、禁用mask

1.路径./configs/base/models/mask_rcnn_swin_fpn.py中第75行use_mask=True 修改为use_mask=False
还需要删除mask_roi_extractor和mask_head两个变量，大概在第63行和68行，这里删除之后注意末尾的逗号和小括号的格式匹配问题
2.路径/configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py中：
第26行dict(type=‘LoadAnnotations’, with_bbox=True, with_mask=True)修改为dict(type=‘LoadAnnotations’, with_bbox=True, with_mask=False)
第60行删掉’gt_masks’

训练时使用下面命令训练：
bash tools/dist_train.sh 'configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py' 1 --cfg-options model.pretrained='checkpoints/swin_tiny_patch4_window7_224.pth'

其中1为GPU数量，按需修改，预训练模型model.pretrained可选

训练后的模型进行推理测试、评估

python3 tools/test.py ./config/retinanet/retinanet_r50_fpn_1x_coco.py ./work_dirs/retinanet_r50_fpn_1x_coco/epoch_12.pth --out ./result/result.pkl --eval bbox


#保存测试结果和测试结果图片，并且对测试结果进行评估
python tools/test.py  work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py  work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco/latest.pth  --out inference_dirs/mask_rcnn_swin_tiny_coco_result.pkl  --eval bbox --show-dir ./inference_dirs/

#生成其他格式的结果文件
python tools/test.py  work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py  work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco/latest.pth  
--format-only  --options "txtfile_prefix=./inference_dirs/mask_rcnn_swin_tiny_coco_result"
 --eval bbox --show-dir ./inference_dirs/




./tools/dist_test.sh \
    configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes.py \
    checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \
    8 \
    --format-only \
    --options "txtfile_prefix=./mask_rcnn_cityscapes_test_results"

将mmdetection产生的coco(json)格式的测试结果转化成VisDrone19官网所需要的txt格式文件（还包括无标签测试图片产生mmdet所需要的test.json方法）

最近VisDrone19开放测试服务器了，自己尝试在它的test的图片上给出标签然后提交到官网上去。
使用了mmdetection工具，这个工具出来的结果是coco(json)格式的。
我们需要把它转化成官网需要的visdrone的txt格式才可以提交到服务器上，查看自己的score。
这个转化脚本目前找不到，所以只能自己写了。

import json
import os
import argparse


# wangzhifneg write in 2021-6-30
def coco2vistxt(json_path, out_folder):
    labels = json.load(open(json_path, 'r', encoding='utf-8'))
    for i in range(len(labels)):
    # print(len(labels)) 158000
    # print(labels[i]['image_id'])
    #     print(labels[i]['bbox'])

        file_name = labels[i]['image_id'] + '.txt'
        with open(os.path.join(args.o, file_name), 'a+', encoding='utf-8') as f:
            l = labels[i]['bbox']
            s = [round(i) for i in l]
            line =str(s)[1:-1].replace(' ','')+ ',' + str(labels[i]['score'])[:6] + ',' + str(labels[i]['category_id']) + ',' + str('-1') + ',' + str('-1')
            f.write(line+'\n')


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='coco test json to visdrone txt file.')
    parser.add_argument('-j', help='JSON file', default='D:/07_codeD/datasets/visdrone/cascadevis10test.bbox.json')
    parser.add_argument('-o', help='path to output folder', default='D:/07_codeD/datasets/visdrone/testout')

    args = parser.parse_args()

    coco2vistxt(json_path=args.j,out_folder=args.o)

简单说一下：
cascadevis10test.bbox.json这个是我们mmdetection产生的json文件，里面有我们对测试图片产生的coco格式的结果。
testout10cascade是传出txt存放的目录。
具体如下图：

PS:
mmdetection需要把测试集（完全是图片，没有标签）转化成一个带有图片信息的json文件才可以进行coco格式的test。
这部分代码官方没有，要自己写，我找了好久才找到可以参考的，自己改动了下。具体代码如下。

import os
import PIL.Image as Image
import json

root = 'D:/07_codeD/datasets/visdrone/VisDrone2019-DET-test-challenge'

test_json = os.path.join(root, 'images')  # test image root
out_file = os.path.join(root, 'test.json')  # test json output path 生成json的位置

data = {}
# 这部分如果不同数据集可以替换
data['categories'] = [{"id": 1, "name": "Pedestrain", "supercategory": "none"},
                      {"id": 2, "name": "People", "supercategory": "none"},
                      {"id": 3, "name": "Bicycle", "supercategory": "none"},
                      {"id": 4, "name": "Car", "supercategory": "none"},
                      {"id": 5, "name": "Van", "supercategory": "none"},
                      {"id": 6, "name": "Truck", "supercategory": "none"},
                      {"id": 7, "name": "Tricycle", "supercategory": "none"},
                      {"id": 8, "name": "Awning-tricycle", "supercategory": "none"},
                      {"id": 9, "name": "Bus", "supercategory": "none"},
                      {"id": 10, "name": "Motor", "supercategory": "none"}]  # 数据集的类别

images = []
for name in os.listdir(test_json):
    file_path = os.path.join(test_json, name)
    file = Image.open(file_path)
    tmp = dict()
    tmp['id'] = name[:-4]

    # idx += 1
    tmp['width'] = file.size[0]
    tmp['height'] = file.size[1]
    tmp['file_name'] = name
    images.append(tmp)

data['images'] = images
with open(out_file, 'w') as f:
    json.dump(data, f)

# with open(out_file, 'r') as f:
#     test = json.load(f)
#     for i in test['categories']:
#         print(i['id'])
print('finish')

这里注意VisDrone是12个类，其中1和11我们是不需要的，大家建立自己的coco(json)和voc(xml)标签的时候就要去除。
我们上面这个代码可以将我们的test(完全是图片)，生成相应的test.json（这里面没有标签，只有图片信息），然后将test.json写入到我们的mmdetection的config中的test部分就好了。
（这部分代码mmdetection官方居然没有。。。资料也好少，之前搞得我恼火得很，这里给后面想做这个数据集的给点参考）

4、遇到的问题及解决办法

1.AssertionError: Incompatible version of pycocotools is installed. Run pip uninstall pycocotools first. Then run pip install mmpycocotools to install open-mmlab forked pycocotools.
解决办法已经给出了，命令行中：

pip uninstall pycocotools

pip install mmpycocotools

2.KeyError: "CascadeRCNN: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'"

预训练模型加载错误，应该使用imagenet预训练的模型，而不是在coco上微调的模型，这个错误我也很无奈啊，跟我预想的使用coco模型预训练不一样，官方github也有人提出相同问题，解决办法就是不加载预训练模型从头训练，或者在https://github.com/microsoft/Swin-Transformer上下载分类的模型。
3.import pycocotools._mask as _mask
File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
numpy版本问题，使用pip install --upgrade numpy升级numpy版本

5、测试训练好的模型

添加一个自己的图片在demo目录下，执行：

python demo/image_demo.py demo/000019.jpg configs\swin\mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco.py work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_3x_coco/latest.pth

latest.pth 就是自己训练好的最新的权重文件，默认会放在workdir下。

不输出实例分割图
```
demo/image_demo.py 做如下修改：

    # test a single image
    result = inference_detector(model, args.img)
    new_result = result[0]
    # show the results
    show_result_pyplot(model, args.img, new_result, score_thr=args.score_thr)
```
三、训练 cascade_mask_rcnn_swin
与之前训练mask_rcnn_swin_一样，但是如果是单卡多修改如下部分
configs/swin/cascade_mask_rcnn_swin_small_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 文件中，所有的 SyncBN 改为 BN。

你可能感兴趣的:(深度学习,深度学习,transformer,目标检测)

transformer中seq_len参数的设置 yuweififi transformer 深度学习人工智能
在Transformer模型中，seq_len（序列长度）是一个关键的超参数，下面从不同方面详细介绍它的具体含义和作用：一、基本定义seq_len表示输入到Transformer模型中的序列所允许的最大长度。在自然语言处理任务里，文本会被拆分成一个个的单词、子词或者字符，这些元素构成了一个序列。seq_len就是对这个序列中元素数量的上限规定，它决定了模型输入和输出的维度。二、具体使用输入处理文本
pytorch基础 nn.embedding yuweififi pytorch 人工智能 nlp
nn.Embedding是PyTorch中的一个模块，用于创建嵌入层（embeddinglayer），它将离散的索引（例如词汇表中的单词索引）映射为固定大小的稠密向量。这是许多NLP模型（包括Transformer）中的基本组件。示例用法：importtorchimporttorch.nnasnn#定义一个嵌入层vocab_size=10000#词汇表大小embedding_dim=512#嵌入向
pytorch基础-layernormal 与 batchnormal yuweififi pytorch 人工智能 python
nn.LayerNorm（层归一化）和nn.BatchNorm（批量归一化）是深度学习中常用的两种归一化方法，都有助于提高模型的训练效率和稳定性，但它们在归一化维度、应用场景、计算方式等方面存在明显区别，以下为你详细介绍：1、归一化维度nn.LayerNorm：对单个样本的特征维度进行归一化。无论输入数据的形状如何，它会计算每个样本在特征维度上的均值和方差，然后进行归一化。例如，对于一个形状为(b
通过TensorFlow实现简单深度学习模型（2） yyc_audio 人工智能深度学习 python 机器学习
前文我们已经实现了对每批数据的训练，下面继续实现一轮完整的训练。完整的训练循环一轮训练就是对训练数据的每个批量都重复上述训练步骤，而完整的训练循环就是重复多轮训练。deffit(model,images,labels,epochs,batch_size=128):forepoch_counterinrange(epochs):print(f"Epoch{epoch_counter}")batch_
Transformer 代码剖析2 - 模型训练（pytorch实现） lczdyx Transformer代码剖析 transformer pytorch 深度学习人工智能 python
一、模型初始化模块参考：项目代码1.1参数统计函数defcount_parameters(model):returnsum(p.numel()forpinmodel.parameters()ifp.requires_grad)遍历模型参数筛选可训练参数统计参数数量返回总数技术解析：numel()方法计算张量元素总数requires_grad筛选需要梯度更新的参数统计结果反映模型复杂度，典型Tran
阿里巴巴DIN模型原理与Python实现 eso1983 python 开发语言算法推荐算法
阿里巴巴的DeepInterestNetwork(DIN)是一种用于点击率预测（CTR）的深度学习模型，特别针对电商场景中用户兴趣多样化和动态变化的特性设计。其核心思想是通过注意力机制动态捕捉用户历史行为中与当前候选商品相关的兴趣。1.DIN模型原理1.核心问题传统推荐模型（如Embedding+MLP）将用户历史行为视为固定长度的向量，忽略了用户兴趣的多样性。例如，用户历史行为中可能包含多个互不
月之暗面改进并开源了 Muon 优化算法，对行业有哪些影响？互联网之路. 知识点开源算法
互联网各领域资料分享专区(不定期更新)：Sheet正文月之暗面团队改进并开源的Muon优化算法在深度学习和大模型训练领域引发了广泛关注，其核心创新在于显著降低算力需求（相比AdamW减少48%的FLOPs）并提升训练效率，同时通过开源推动技术生态的共建。1.显著降低大模型训练成本，推动技术普惠算力需求锐减：Muon通过引入权重衰减和一致的RMS更新，解决了原始Muon在大规模训练中的稳定性问题，使
Spring Boot 动态配置数据源全解析 ♢.＊ spring boot 后端 java
亲爱的小伙伴们，在求知的漫漫旅途中，若你对深度学习的奥秘、Java与Python的奇妙世界，亦或是读研论文的撰写攻略有所探寻，那不妨给我一个小小的关注吧。我会精心筹备，在未来的日子里不定期地为大家呈上这些领域的知识宝藏与实用经验分享。每一个点赞，都如同春日里的一缕阳光，给予我满满的动力与温暖，让我们在学习成长的道路上相伴而行，共同进步✨。期待你的关注与点赞哟！引言在企业级应用开发中，单一数据源往往
深入解析：如何编写 Mapper 文件 ♢.＊ oracle 数据库 mybatis
亲爱的小伙伴们，在求知的漫漫旅途中，若你对深度学习的奥秘、Java与Python的奇妙世界，亦或是读研论文的撰写攻略有所探寻，那不妨给我一个小小的关注吧。我会精心筹备，在未来的日子里不定期地为大家呈上这些领域的知识宝藏与实用经验分享。每一个点赞，都如同春日里的一缕阳光，给予我满满的动力与温暖，让我们在学习成长的道路上相伴而行，共同进步✨。期待你的关注与点赞哟！在软件开发尤其是涉及数据库交互的项目中
Spring Boot 中 @Transactional 注解全面解析 ♢.＊ spring boot 数据库 sql
亲爱的小伙伴们，在求知的漫漫旅途中，若你对深度学习的奥秘、Java与Python的奇妙世界，亦或是读研论文的撰写攻略有所探寻，那不妨给我一个小小的关注吧。我会精心筹备，在未来的日子里不定期地为大家呈上这些领域的知识宝藏与实用经验分享。每一个点赞，都如同春日里的一缕阳光，给予我满满的动力与温暖，让我们在学习成长的道路上相伴而行，共同进步✨。期待你的关注与点赞哟！引言在企业级应用开发中，数据的一致性和
大模型专栏博文汇总和索引 Donvink 大模型 transformer 深度学习人工智能语言模型
大模型专栏主要是汇总了我在学习大模型相关技术期间所做的一些总结和笔记，主要包括以下几个子专栏：DeepSeek-R1AIGC大模型实践Transformer多模态系统视频理解对比学习目标检测目标跟踪图神经网络大模型专栏汇总了以上所有子专栏的论文，目前暂时先按照不同的技术领域划分子专栏，子专栏之间的内容可能会有交集，不完全是独立的。为了方便查阅相关模块的内容，故以此文章进行汇总与索引。一、DeepS
深度学习模型优化与医疗诊断应用突破智能计算研究中心其他
内容概要近年来，深度学习技术的迭代演进正在重塑医疗诊断领域的实践范式。随着PyTorch与TensorFlow等开源框架的持续优化，模型开发效率显著提升，为医疗场景下的复杂数据处理提供了技术基座。当前研究聚焦于迁移学习与模型压缩算法的协同创新，通过复用预训练模型的泛化能力与降低计算负载，有效解决了医疗数据样本稀缺与硬件资源受限的痛点问题。与此同时，自适应学习机制通过动态调整网络参数更新策略，在病理
阿里云服务器的作用腾云服务器阿里云服务器云计算
使用阿里云服务器能做什么？大家都知道可以用来搭建网站、数据库、机器学习、Python爬虫、大数据分析等应用，阿里云服务器网来详细说下使用阿里云服务器常见的玩法以及企业或个人用户常见的使用场景：玩转阿里云服务器使用阿里云服务器最常见的应用就是用来搭建网站，例如个人博客、企业网站等；除了搭建网站还可以利用阿里云GPU服务器搭建机器学习和深度学习等AI应用；使用阿里云大数据类型云服务器做数据分析；利用云
阿里云人工智能与机器学习 HaoHao_010 阿里云云服务器云计算服务器
阿里云的人工智能（AI）与机器学习（ML）服务为企业提供了全面的AI解决方案，帮助用户在多个行业实现数据智能化，提升决策效率，推动业务创新。阿里云通过先进的技术和丰富的工具，支持用户开发、部署和管理AI应用。以下是阿里云在人工智能和机器学习方面的主要产品与服务：1.云上机器学习平台—PaaS服务PAI(PlatformforAI)PAI是阿里云推出的人工智能平台，提供一系列机器学习与深度学习工具和
AI探索笔记：浅谈人工智能算法分类安意诚Matrix 机器学习笔记人工智能笔记
人工智能算法分类这是一张经典的图片，基本概况了人工智能算法的现状。这张图片通过三个同心圆展示了人工智能、机器学习和深度学习之间的包含关系，其中人工智能是最广泛的范畴，机器学习是其子集，专注于数据驱动的算法改进，而深度学习则是机器学习中利用多层神经网络进行学习的特定方法。但是随着时代的发展，这张图片表达得也不是太全面了。我更喜欢把人工智能算法做如下的分类：传统机器学习算法-线性回归、逻辑回归、支持向
YOLOv11改进 | 检测头改进篇 | 利用ASFF改进YOLOv11检测头，自适应空间特征融合模块，在所有的目标检测上均有大幅度的涨点效果 Ai缝合怪YOLO涨点改进 YOLO 目标检测计算机视觉深度学习 YOLOv11 YOLOv8 YOLOv10
YOLOv8v10v11专栏限时199元订阅链接:限时199元去b站关注：AI缝合怪订阅YOLOv8v10v11创新改进高效涨点+持续改进500多篇（订阅的小伙伴，终身免费享有后续YOLOv12或是其他版本的改进专栏）目录一、ASFF模块介绍ASFF网络结构图：ASFF的创新点主要包括：作用原理优势二、核心代码三、手把手教你添加v11Detect_ASFFHead检测头模块1.首先在ultraly
ASFF改进YOLOv8检测头：提升目标检测精度与效率的创新方法【YOLOv8】步入烟尘 YOLO系列创新涨点超专栏 YOLO 目标检测目标跟踪 ASFF YOLOv8
本专栏专为AI视觉领域的爱好者和从业者打造。涵盖分类、检测、分割、追踪等多项技术，带你从入门到精通！后续更有实战项目，助你轻松应对面试挑战！立即订阅，开启你的YOLOv8之旅！专栏订阅地址：https://blog.csdn.net/mrdeam/category_12804295.html文章目录ASFF改进YOLOv8检测头：提升目标检测精度与效率的创新方法【YOLOv8】1.背景介绍1.1Y
VQ-Diffusion 深度解析与实战指南晏灵昀Odette
VQ-Diffusion深度解析与实战指南VQ-Diffusion项目地址:https://gitcode.com/gh_mirrors/vqd/VQ-Diffusion1.项目介绍VQ-Diffusion是一个用于文本到图像合成的深度学习模型，基于矢量量化变分自编码器（VQ-VAE）和去噪扩散概率模型（DenoisingDiffusionProbabilisticModel）。该模型通过将DDP
实现红外触感按键扫描函数平凡灵感码头 stm32项目实现 stm32
函数目标检测GPIOC第8号引脚的电平状态（假设低电平触发），实现按键消抖和状态锁定，返回键值5表示按键被按下，未按下时返回0xff。代码逐行解析1.变量定义u8ir_value=0xff;//默认返回未按下状态（0xff）staticu8ir_flag=1;//状态锁存标志，初始为1（允许检测）ir_value：存储按键返回值，初始化0xff表示未按下。ir_flag：静态变量（保持状态跨函数调
【模块】AKConv卷积模块 dearr__ 扒网络模块深度学习人工智能
论文《AKConv:ConvolutionalKernelwithArbitrarySampledShapesandArbitraryNumberofParameters》1、作用AKConv旨在解决深度学习中标准卷积操作的两个固有限制：限定在局部窗口内，限制了从其他位置捕获信息的能力；卷积核固定大小，限制了对不同目标形状和大小的适应能力。这种新方法允许卷积核具有任意参数和采样形状，提供了一种灵活
ELMo ，LM：一串词序列的概率分布probability distribution over sequences of words 强化学习曾小健 NLP自然语言处理 #预训练语言模型
语言模型（LanguageModel），语言模型简单来说就是一串词序列的概率分布。Languagemodelisaprobabilitydistributionoversequencesofwords.GPT与ELMo当成特征的做法不同，OpenAIGPT不需要再重新对任务构建新的模型结构，而是直接在transformer这个语言模型上的最后一层接上softmax作为任务输出层，然后再对这整个模型
yolo格式 ZHOU_WUYI ultralytics YOLO 人工智能
目录yolo格式yolo格式与coco格式的区别1.数据结构2.标注内容3.文件格式4.扩展性5.应用场景总结：yolo格式YOLO（YouOnlyLookOnce）格式通常用于目标检测任务中的标注数据格式。YOLO的标注格式包括每个目标的类别和其在图像中的位置（boundingbox）。YOLO格式的标注文件是一个文本文件，每一行表示一个目标，内容包括目标类别的编号和该目标在图像中的位置（相对于
【保姆级视频教程（二）】YOLOv12训练数据集构建：标签格式转换-划分-YAML 配置避坑指南 | 小白也能轻松玩转目标检测！一只云卷云舒 YOLOv12保姆级通关教程 YOLO 目标检测人工智能 Ultralytics 数据集 YOLOv12 小白教程
【2025全站首发】YOLOv12训练数据集构建：标签格式转换-划分-YAML配置避坑指南|小白也能轻松玩转目标检测！文章目录1.数据集准备1.1标签格式转换1.2数据集划分1.3yaml配置文件创建2.训练验证1.数据集准备示例数据集下载链接：PKU-Market-PCB数据集1.1标签格式转换cursorprompt请撰写一个py脚本。将@Annotations文件夹下的所有类别的xml格式的
DCMNet一种用于目标检测的轻量级骨干结构模型详解及代码复现清风AI 深度学习算法详解及代码复现深度学习机器学习计算机视觉人工智能算法目标检测
模型背景在深度学习技术快速发展的背景下，目标检测领域取得了显著进展。早期的手工特征提取方法如Viola-Jones和HOG逐渐被卷积神经网络（CNN）取代，其中AlexNet在2012年的ILSVRC比赛中表现突出，推动了CNN在计算机视觉中的广泛应用。然而，这些早期模型在精度和效率方面仍存在不足，尤其是在处理复杂场景和小目标时表现不佳。这为DCMNet等新型轻量化目标检测模型的出现提供了契机，旨
DeepSeek应用领域全景解析：驱动产业智能化升级的六大核心方向量子纠缠BUG DeepSeek部署 AI DeepSeek 人工智能 AI编程深度学习
一、引言：DeepSeek为何成为产业智能化首选？作为国产大模型的标杆产品，DeepSeek凭借其万亿级参数规模、MoE混合专家架构和多模态交互能力，正在重构产业智能化升级的技术路径。本文基于官方技术文档与行业实践案例，深入剖析DeepSeek在六大核心领域的应用突破与商业价值实现二、技术底座：支撑多领域落地的三大创新架构1.Transformer-XL增强架构通过引入Multi-HeadLate
DeepSeek全栈接入指南：从零到生产环境的深度实践量子纠缠BUG DeepSeek部署 AI DeepSeek 人工智能深度学习机器学习
第一章：DeepSeek技术体系全景解析1.1认知DeepSeek技术生态DeepSeek作为新一代人工智能技术平台，构建了覆盖算法开发、模型训练、服务部署的全链路技术栈。其核心能力体现在：1.1.1多模态智能引擎自然语言处理：支持文本生成（NLG）、语义理解（NLU）、情感分析等计算机视觉：提供图像分类、目标检测、OCR识别等CV能力语音交互：包含语音识别（ASR）、语音合成（TTS）及声纹识别
Ollama本地私有化部署通义千问大模型Qwen2.5 ErbaoLiu 数据分析&大模型机器学习&大模型自然语言处理&大模型大模型 LLM Qwen2.5 Qwen2 Ollama
目录Qwen2.5介绍Qwen2.5新闻Ollama介绍Linux安装Ollama一键安装Ollama手工安装Ollama卸载OllamaOllama运行Qwen2基于Transformers进行推理本文复现环境：Python3.12.6+Windows8.1+LinuxCentOS7+PyCharmCommunityEdition2022.3.3。Qwen2.5github地址如下：GitHub
注意力机制（Attention Mechanism）详细分类与介绍 Jason_Orton 分类数据挖掘人工智能
注意力机制（AttentionMechanism）是近年来在深度学习中非常流行的一种技术，特别是在自然语言处理（NLP）、计算机视觉等任务中，具有显著的效果。它的核心思想是模仿人类在处理信息时的注意力分配方式，根据不同部分的重要性给予不同的关注程度。1.注意力机制的背景与动机在传统的深度学习模型（如RNN、CNN等）中，信息处理通常是按照固定的规则和结构进行的，模型对输入的各个部分给予相同的关注。
图神经网络：拓扑数据分析的新时代 Jason_Orton 神经网络数据分析人工智能
随着图数据的广泛应用，图神经网络（GraphNeuralNetwork,GNN）作为一种强大的深度学习工具，逐渐成为机器学习领域中的一颗新星。图数据在许多现实世界问题中无处不在，诸如社交网络、交通网络、分子结构、推荐系统等都可以被建模为图结构。图神经网络通过直接处理图结构数据，能够更好地捕捉节点之间的关系信息，从而在众多任务中展现出了优异的性能。本文将深入探讨图神经网络的基本原理、常见的算法、应用
智算中心的核心硬件是什么？ Imagination官方博客
本文来源：游方AI智算中心，作为人工智能时代的关键基础设施，其核心硬件的构成与性能直接影响着智能计算的效率与质量。以下是对智算中心核心硬件的详细阐述：一、AI芯片AI芯片是专门为加速人工智能计算而设计的硬件，能够与各种AI算法协同工作，满足对算力的极高需求。当前主流的AI加速计算芯片包括：1、GPU（图形处理器）GPU是智算中心的算力担当，其强大的并行计算能力使其在深度学习领域大放异彩。GPU芯片
xml解析小猪猪08 xml
1、DOM解析的步奏准备工作： 1.创建DocumentBuilderFactory的对象 2.创建DocumentBuilder对象 3.通过DocumentBuilder对象的parse(String fileName)方法解析xml文件 4.通过Document的getElem
每个开发人员都需要了解的一个SQL技巧 brotherlamp linux linux视频 linux教程 linux自学 linux资料
对于数据过滤而言CHECK约束已经算是相当不错了。然而它仍存在一些缺陷，比如说它们是应用到表上面的，但有的时候你可能希望指定一条约束，而它只在特定条件下才生效。使用SQL标准的WITH CHECK OPTION子句就能完成这点，至少Oracle和SQL Server都实现了这个功能。下面是实现方式： CREATE TABLE books ( id &
Quartz——CronTrigger触发器 eksliang quartz CronTrigger
转载请出自出处：http://eksliang.iteye.com/blog/2208295 一.概述 CronTrigger 能够提供比 SimpleTrigger 更有具体实际意义的调度方案，调度规则基于 Cron 表达式，CronTrigger 支持日历相关的重复时间间隔（比如每月第一个周一执行），而不是简单的周期时间间隔。二.Cron表达式介绍 1）Cron表达式规则表 Quartz
Informatica基础 18289753290 Informatica Monitor manager workflow Designer
1. 1）PowerCenter Designer：设计开发环境，定义源及目标数据结构；设计转换规则，生成ETL映射。 2）Workflow Manager：合理地实现复杂的ETL工作流，基于时间，事件的作业调度 3）Workflow Monitor：监控Workflow和Session运行情况，生成日志和报告 4）Repository Manager：
linux下为程序创建启动和关闭的的sh文件，scrapyd为例酷的飞上天空 scrapy
对于一些未提供service管理的程序每次启动和关闭都要加上全部路径，想到可以做一个简单的启动和关闭控制的文件下面以scrapy启动server为例，文件名为run.sh： #端口号，根据此端口号确定PID PORT=6800 #启动命令所在目录 HOME='/home/jmscra/scrapy/' #查询出监听了PORT端口
人--自私与无私永夜-极光
今天上毛概课,老师提出一个问题--人是自私的还是无私的,根源是什么? 从客观的角度来看,人有自私的行为,也有无私的
Ubuntu安装NS-3 环境脚本随便小屋 ubuntu
将附件下载下来之后解压，将解压后的文件ns3environment.sh复制到下载目录下（其实放在哪里都可以，就是为了和我下面的命令相统一）。输入命令： sudo ./ns3environment.sh >>result 这样系统就自动安装ns3的环境，运行的结果在result文件中，如果提示 com
创业的简单感受 aijuans 创业的简单感受
2009年11月9日我进入a公司实习，2012年4月26日，我离开a公司，开始自己的创业之旅。今天是2012年5月30日，我忽然很想谈谈自己创业一个月的感受。当初离开边锋时，我就对自己说：“自己选择的路，就是跪着也要把他走完”，我也做好了心理准备，准备迎接一次次的困难。我这次走出来，不管成败
如何经营自己的独立人脉 aoyouzi 如何经营自己的独立人脉
独立人脉不是父母、亲戚的人脉，而是自己主动投入构造的人脉圈。“放长线，钓大鱼”，先行投入才能产生后续产出。现在几乎做所有的事情都需要人脉。以银行柜员为例，需要拉储户，而其本质就是社会人脉，就是社交！很多人都说，人脉我不行，因为我爸不行、我妈不行、我姨不行、我舅不行……我谁谁谁都不行，怎么能建立人脉？我这里说的人脉，是你的独立人脉。以一个普通的银行柜员
JSP基础百合不是茶 jsp 注释隐式对象
1,JSP语句的声明 <%! 声明 %> 　　声明：这个就是提供java代码声明变量、方法等的场所。表达式 <%= 表达式 %> 　　这个相当于赋值，可以在页面上显示表达式的结果，程序代码段/小型指令　<% 程序代码片段 %> 2,JSP的注释
web.xml之session-config、mime-mapping bijian1013 java web.xml servlet session-config mime-mapping
session-config 1.定义： <session-config> <session-timeout>20</session-timeout> </session-config> 2.作用：用于定义整个WEB站点session的有效期限，单位是分钟。 mime-mapping 1.定义： <mime-m
互联网开放平台（1） Bill_chen 互联网 qq 新浪微博百度腾讯
现在各互联网公司都推出了自己的开放平台供用户创造自己的应用，互联网的开放技术欣欣向荣，自己总结如下： 1.淘宝开放平台(TOP) 网址：http://open.taobao.com/ 依赖淘宝强大的电子商务数据，将淘宝内部业务数据作为API开放出去，同时将外部ISV的应用引入进来。目前TOP的三条主线： TOP访问网站：open.taobao.com ISV后台：my.open.ta
【MongoDB学习笔记九】MongoDB索引 bit1129 mongodb
索引可以在任意列上建立索引索引的构造和使用与传统关系型数据库几乎一样,适用于Oracle的索引优化技巧也适用于Mongodb 使用索引可以加快查询,但同时会降低修改,插入等的性能内嵌文档照样可以建立使用索引测试数据 var p1 = { "name":"Jack", "age&q
JDBC常用API之外的总结白糖_ jdbc
做JAVA的人玩JDBC肯定已经很熟练了，像DriverManager、Connection、ResultSet、Statement这些基本类大家肯定很常用啦，我不赘述那些诸如注册JDBC驱动、创建连接、获取数据集的API了，在这我介绍一些写框架时常用的API，大家共同学习吧。 ResultSetMetaData获取ResultSet对象的元数据信息
apache VelocityEngine使用记录 bozch VelocityEngine
VelocityEngine是一个模板引擎，能够基于模板生成指定的文件代码。使用方法如下： VelocityEngine engine = new VelocityEngine();// 定义模板引擎 Properties properties = new Properties();// 模板引擎属
编程之美-快速找出故障机器 bylijinnan 编程之美
package beautyOfCoding; import java.util.Arrays; public class TheLostID { /*编程之美假设一个机器仅存储一个标号为ID的记录，假设机器总量在10亿以下且ID是小于10亿的整数，假设每份数据保存两个备份，这样就有两个机器存储了同样的数据。 1.假设在某个时间得到一个数据文件ID的列表，是
关于Java中redirect与forward的区别 chenbowen00 java servlet
在Servlet中两种实现： forward方式：request.getRequestDispatcher(“/somePage.jsp”).forward(request, response); redirect方式：response.sendRedirect(“/somePage.jsp”); forward是服务器内部重定向，程序收到请求后重新定向到另一个程序，客户机并不知
[信号与系统]人体最关键的两个信号节点 comsci 系统
如果把人体看做是一个带生物磁场的导体,那么这个导体有两个很重要的节点,第一个在头部,中医的名称叫做百汇穴, 另外一个节点在腰部,中医的名称叫做命门如果要保护自己的脑部磁场不受到外界有害信号的攻击,最简单的
oracle 存储过程执行权限 daizj oracle 存储过程权限执行者调用者
在数据库系统中存储过程是必不可少的利器，存储过程是预先编译好的为实现一个复杂功能的一段Sql语句集合。它的优点我就不多说了，说一下我碰到的问题吧。我在项目开发的过程中需要用存储过程来实现一个功能，其中涉及到判断一张表是否已经建立，没有建立就由存储过程来建立这张表。 CREATE OR REPLACE PROCEDURE TestProc IS fla
为mysql数据库建立索引 dengkane mysql 性能索引
前些时候，一位颇高级的程序员居然问我什么叫做索引，令我感到十分的惊奇，我想这绝不会是沧海一粟，因为有成千上万的开发者（可能大部分是使用MySQL的）都没有受过有关数据库的正规培训，尽管他们都为客户做过一些开发，但却对如何为数据库建立适当的索引所知较少，因此我起了写一篇相关文章的念头。最普通的情况，是为出现在where子句的字段建一个索引。为方便讲述，我们先建立一个如下的表。
学习C语言常见误区如何看懂一个程序如何掌握一个程序以及几个小题目示例 dcj3sjt126com c 算法
如果看懂一个程序，分三步 1、流程 2、每个语句的功能 3、试数如何学习一些小算法的程序尝试自己去编程解决它，大部分人都自己无法解决如果解决不了就看答案关键是把答案看懂，这个是要花很大的精力，也是我们学习的重点看懂之后尝试自己去修改程序，并且知道修改之后程序的不同输出结果的含义照着答案去敲调试错误
centos6.3安装php5.4报错 dcj3sjt126com centos6
报错内容如下: Resolving Dependencies --> Running transaction check ---> Package php54w.x86_64 0:5.4.38-1.w6 will be installed --> Processing Dependency: php54w-common(x86-64) = 5.4.38-1.w6 for
JSONP请求 flyer0126 jsonp
使用jsonp不能发起POST请求。 It is not possible to make a JSONP POST request. JSONP works by creating a <script> tag that executes Javascript from a different domain; it is not pos
Spring Security（03）——核心类简介 234390216 Authentication
核心类简介目录 1.1 Authentication 1.2 SecurityContextHolder 1.3 AuthenticationManager和AuthenticationProvider 1.3.1 &nb
在CentOS上部署JAVA服务 java--hhf java jdk centos Java服务
本文将介绍如何在CentOS上运行Java Web服务，其中将包括如何搭建JAVA运行环境、如何开启端口号、如何使得服务在命令执行窗口关闭后依旧运行第一步：卸载旧Linux自带的JDK ①查看本机JDK版本 java -version 结果如下 java version "1.6.0"
oracle、sqlserver、mysql常用函数对比[to_char、to_number、to_date] ldzyz007 oracle mysql SQL Server
oracle &n
记Protocol Oriented Programming in Swift of WWDC 2015 ningandjin protocol WWDC 2015 Swift2.0
其实最先朋友让我就这个题目写篇文章的时候，我是拒绝的，因为觉得苹果就是在炒冷饭，把已经流行了数十年的OOP中的“面向接口编程”还拿来讲，看完整个Session之后呢，虽然还是觉得在炒冷饭，但是毕竟还是加了蛋的，有些东西还是值得说说的。通常谈到面向接口编程，其主要作用是把系统设计和具体实现分离开，让系统的每个部分都可以在不影响别的部分的情况下，改变自身的具体实现。接口的设计就反映了系统
搭建 CentOS 6 服务器(15) - Keepalived、HAProxy、LVS rensanning keepalived
（一）Keepalived （1）安装 # cd /usr/local/src # wget http://www.keepalived.org/software/keepalived-1.2.15.tar.gz # tar zxvf keepalived-1.2.15.tar.gz # cd keepalived-1.2.15 # ./configure # make &a
ORACLE数据库SCN和时间的互相转换 tomcat_oracle oracle sql
SCN（System Change Number 简称 SCN）是当Oracle数据库更新后，由DBMS自动维护去累积递增的一个数字，可以理解成ORACLE数据库的时间戳，从ORACLE 10G开始，提供了函数可以实现SCN和时间进行相互转换；　　用途：在进行数据库的还原和利用数据库的闪回功能时，进行SCN和时间的转换就变的非常必要了；　　操作方法：　　1、通过dbms_f
Spring MVC 方法注解拦截器 xp9802 spring mvc
应用场景，在方法级别对本次调用进行鉴权，如api接口中有个用户唯一标示accessToken,对于有accessToken的每次请求可以在方法加一个拦截器，获得本次请求的用户，存放到request或者session域。 python中，之前在python flask中可以使用装饰器来对方法进行预处理，进行权限处理先看一个实例,使用@access_required拦截： ?