qq_41627642

深度学习目标检测数据VisDrone2019（to yolo / voc / coco）---MMDetection数据篇

1、VisDrone2019数据集介绍

配备摄像头的无人机(或通用无人机)已被快速部署到广泛的应用领域，包括农业、航空摄影、快速交付和监视。因此，从这些平台上收集的视觉数据的自动理解要求越来越高，这使得计算机视觉与无人机的关系越来越密切。我们很高兴为各种重要的计算机视觉任务展示一个大型基准，并仔细注释了地面真相，命名为VisDrone，使视觉与无人机相遇。VisDrone2019数据集由天津大学机器学习和数据挖掘实验室AISKYEYE团队收集。基准数据集包括288个视频片段，由261908帧和10209幅静态图像组成，由各种无人机摄像头捕获，覆盖范围广泛，包括位置(来自中国相隔数千公里的14个不同城市)、环境(城市和农村)、物体(行人、车辆、自行车、等)和密度(稀疏和拥挤的场景)。请注意，数据集是在不同的场景、不同的天气和光照条件下使用不同的无人机平台(即不同型号的无人机)收集的。这些框架用超过260万个经常感兴趣的目标框手工标注，比如行人、汽车、自行车和三轮车。一些重要的属性，包括场景可见性，对象类和遮挡，也提供了更好的数据利用。
挑战主要集中在四个任务上:
(1)任务1:图像中的目标检测挑战。该任务旨在从无人机拍摄的单个图像中检测预定义类别的物体(如汽车和行人)。
(2)任务2:视频中的物体检测挑战。该任务与task 1类似，不同之处在于需要从视频中检测物体。
(3)task 3:单物体跟踪挑战。
(4)任务4:多目标跟踪挑战(multiobject tracking challenge)。
(5)任务5:人群计数挑战。该任务的目的是统计每个视频帧中的人数。

1、目标检测数据介绍

我们很高兴宣布VisDrone2021图像对象检测挑战(任务1)。该比赛旨在推动与无人机平台的对象检测的最先进技术。要求团队预测10个预定义类别(即行人、人、汽车、面包车、巴士、卡车、摩托车、自行车、遮阳篷-三轮车和三轮车
pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, and tricycle)的物体边界盒，并给出实值置信度。一些很少发生的特种车辆(如机械车间卡车、叉车、油罐车)在评估中被忽略。
据DeepBlueAI团队介绍，虽然该比赛已举办多届，仍然存在以下几个难点：

大量的检测物体
部分目标过小
不同的数据分布
目标遮挡严重

2、数据下载

3、任务一，目标检测数据集

我们很高兴地宣布VisDrone2021图像对象检测挑战(任务1)。该比赛旨在推动与无人机平台的最先进的目标检测。要求团队预测10个预定义类别(pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, and tricycle)的物体边界盒，并给出实值置信度。一些很少发生的特种车辆(如machineshop truck, forklift truck, and tanker)在评估中被忽略。
该挑战包含10209张静态图像(6471张用于训练，548张用于验证，3190张用于测试)，由无人机平台在不同地点和不同高度捕获，可在下载页面上下载。我们手动注释每个图像中不同类别对象的边界框。此外，我们还提供了两种有用的注释，遮挡比和截断比。具体地说，我们使用被遮挡物体的比例来定义遮挡比。截断比用来表示物体部分出现在框架外的程度。如果一个物体在一帧内没有被完全捕获，我们在帧边界上标注边界框，并根据图像外部区域估计截断比。值得一提的是，如果目标的截断比大于50%，则在评估过程中跳过该目标。关于培训和验证集的注释是公开可用的。
对于DET比赛，有三组数据和标签:训练数据、验证数据和测试挑战数据。这三组之间没有重叠。
Number of images

Dataset Training Validation Test-Challenge

Object detection in images 6,471 images 548 images 1,580 images

1、标签类别

标签从0到11分别为’ignored regions’,‘pedestrian’,‘people’,‘bicycle’,‘car’,‘van’,
‘truck’,‘tricycle’,‘awning-tricycle’,‘bus’,‘motor’,‘others’

2、注释标签

,,,,,,,

,,,,,,,
Name Description

The x coordinate of the top-left corner of the predicted bounding box

The y coordinate of the top-left corner of the predicted object bounding box

The width in pixels of the predicted object bounding box

The height in pixels of the predicted object bounding box

The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing
an object instance.
The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation,
while 0 indicates the bounding box will be ignored.

The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1),
people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10),
others(11))

The score in the DETECTION result file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame
(i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).

The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0
(occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2
(occlusion ratio 50% ~ 100%)).

其中：两种有用的注释：truncation截断率,occlusion遮挡率。

被遮挡的对象比例来定义遮挡率。

截断率用于指示对象部分出现在框架外部的程度。

如果目标的截断率大于50％，则会在评估过程中将其跳过。

3、数据评估

我们要求每个评估的算法以预定义的格式输出检测到的带有每个测试图像置信度得分的边界框列表。请参阅结果格式了解更多细节。与MS COCO[1]的评估协议类似，我们使用 AP, APIOU=0.50, APIOU=0.75, ARmax=1, ARmax=10, ARmax=100, and ARmax=500 metrics to evaluate the results of detection algorithms。除非另有规定，AP和AR指标是在联合(loU)值的多个交集上平均的。具体来说，我们使用十个loU阈值[0.50:0.05:0.95]。所有指标的计算允许最多500个最高得分检测每个图像(跨所有类别)。这些标准会惩罚对象检测缺失和重复检测(同一个对象实例有两个检测结果)。AP指标被用作算法排序的主要指标。下表描述了这些指标。
Measure Perfect Description
AP 100% The average precision over all 10 IoU thresholds (i.e., [0.5:0.05:0.95]) of all object categories
所有对象类别的10个IoU阈值(即[0.5:0.05:0.95])的平均精度
APIOU=0.50 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.50
当IoU与地面真实值重叠时，所有对象类别的大于0.50的平均精度
APIOU=0.75 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.75
ARmax=1 100% The maximum recall given 1 detection per image，给定每幅图像一次检测的最大召回率
ARmax=10 100% The maximum recall given 10 detections per image，给定每幅图像10次检测的最大召回率
ARmax=100 100% The maximum recall given 100 detections per image
ARmax=500 100% The maximum recall given 500 detections per image
以上指标是根据10个感兴趣的对象类别计算的。综合评估，我们将报告每个对象类别的性能。图像中对象检测的评估代码可以在VisDrone github上获得。

evalDET.m is the main function used to evaluate your detector -please modify the dataset path and result path -use “isImgDisplay” to display the groundtruth and detections

4、VisDrone2019目标检测数据集格式转换

4.1 转换为YOLO（TXT）格式

YOLO数据集文件夹共有两个子文件夹，一个是 images ，一个是 labels ，分别存放图片与标签txt文件，并且 images与labels的目录结构需要对应，因为yolo是先读取images图片路径，随后直接将images替换为labels来查找标签文件。如下所示：

每张图片对应的txt文件中，数据格式是：cls_id x y w h 其中坐标(x,y)是中心点坐标，并且是相对于图片宽高的比例值，并非绝对坐标。

新版本的yolov5中已经集成了训练visdrone数据集的配置文件，其中附带了数据集的处理方式，主要是labels的生成，可以新建一个visDrone2019_txt2txt_yolo.py文件。

'''
Author: 刘鸿燕 13752614153@163.com
Date: 2022-05-09 14:05:05
LastEditors: 刘鸿燕 13752614153@163.com
LastEditTime: 2022-05-09 15:38:09
FilePath: \VisDrone2019\data_process\visDrone2019_txt2txt_yolo.py
Description: 这是默认设置,请设置`customMade`, 打开koroFileHeader查看配置 进行设置: https://github.com/OBKoro1/koro1FileHeader/wiki/%E9%85%8D%E7%BD%AE
'''
import os
from pathlib import Path
from PIL import Image
from tqdm import tqdm

def visdrone2yolo(dir):
    def convert_box(size, box):
        #Convert VisDrone box to YOLO CxCywh box,坐标进行了归一化
        dw = 1. / size[0]
        dh = 1. / size[1]
        return (box[0] + box[2] / 2) * dw, (box[1] + box[3] / 2) * dh, box[2] * dw, box[3] * dh

    # (dir / 'labels').mkdir(parents=True, exist_ok=True)  # make labels directory
    (dir / 'Annotations_YOLO').mkdir(parents=True, exist_ok=True)  # make labels directory
    pbar = tqdm((dir / 'annotations').glob('*.txt'), desc=f'Converting {dir}')
    for f in pbar:
        img_size = Image.open((dir / 'images' / f.name).with_suffix('.jpg')).size
        lines = []
        with open(f, 'r') as file:  # read annotation.txt
            for row in [x.split(',') for x in file.read().strip().splitlines()]:
                if row[4] == '0':  # VisDrone 'ignored regions' class 0
                    continue
                cls = int(row[5]) - 1
                box = convert_box(img_size, tuple(map(int, row[:4])))
                lines.append(f"{cls} {' '.join(f'{x:.6f}' for x in box)}\n")
                with open(str(f).replace(os.sep + 'annotations' + os.sep, os.sep + 'Annotations_YOLO' + os.sep), 'w') as fl:
                    fl.writelines(lines)  # write label.txt


dir = Path(r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019')  # dataset文件夹下Visdrone2019文件夹路径
# Convert
for d in 'VisDrone2019-DET-train', 'VisDrone2019-DET-val', 'VisDrone2019-DET-test-dev':
    visdrone2yolo(dir / d)  # convert VisDrone annotations to YOLO labels

正确执行代码后，会在’VisDrone2019-DET-train’, ‘VisDrone2019-DET-val’, 'VisDrone2019-DET-test-dev三个文件夹内新生成Annotations_YOLO文件夹，用以存放将VisDrone数据集处理成YoloV5格式后的数据标签。

标签为yolo格式数据集划分训练集和验证集

from xml.dom.minidom import Document
import os
import cv2

# def makexml(txtPath, xmlPath, picPath):  # txt所在文件夹路径，xml文件保存路径，图片所在文件夹路径
def makexml(picPath, txtPath, xmlPath):  # txt所在文件夹路径，xml文件保存路径，图片所在文件夹路径
    """此函数用于将yolo格式txt标注文件转换为voc格式xml标注文件
    在自己的标注图片文件夹下建三个子文件夹，分别命名为picture、txt、xml
    """
    dic = {'0': "hat",  # 创建字典用来对类型进行转换
           '1': "person",  # 此处的字典要与自己的classes.txt文件中的类对应，且顺序要一致
           }
    files = os.listdir(txtPath)
    for i, name in enumerate(files):
        xmlBuilder = Document()
        annotation = xmlBuilder.createElement("annotation")  # 创建annotation标签
        xmlBuilder.appendChild(annotation)
        txtFile = open(txtPath + name)
        txtList = txtFile.readlines()
        img = cv2.imread(picPath + name[0:-4] + ".jpg")
        Pheight, Pwidth, Pdepth = img.shape
 
        folder = xmlBuilder.createElement("folder")  # folder标签
        foldercontent = xmlBuilder.createTextNode("driving_annotation_dataset")
        folder.appendChild(foldercontent)
        annotation.appendChild(folder)  # folder标签结束
 
        filename = xmlBuilder.createElement("filename")  # filename标签
        filenamecontent = xmlBuilder.createTextNode(name[0:-4] + ".jpg")
        filename.appendChild(filenamecontent)
        annotation.appendChild(filename)  # filename标签结束
 
        size = xmlBuilder.createElement("size")  # size标签
        width = xmlBuilder.createElement("width")  # size子标签width
        widthcontent = xmlBuilder.createTextNode(str(Pwidth))
        width.appendChild(widthcontent)
        size.appendChild(width)  # size子标签width结束
 
        height = xmlBuilder.createElement("height")  # size子标签height
        heightcontent = xmlBuilder.createTextNode(str(Pheight))
        height.appendChild(heightcontent)
        size.appendChild(height)  # size子标签height结束
 
        depth = xmlBuilder.createElement("depth")  # size子标签depth
        depthcontent = xmlBuilder.createTextNode(str(Pdepth))
        depth.appendChild(depthcontent)
        size.appendChild(depth)  # size子标签depth结束
 
        annotation.appendChild(size)  # size标签结束
 
        for j in txtList:
            oneline = j.strip().split(" ")
            object = xmlBuilder.createElement("object")  # object 标签
            picname = xmlBuilder.createElement("name")  # name标签
            namecontent = xmlBuilder.createTextNode(dic[oneline[0]])
            picname.appendChild(namecontent)
            object.appendChild(picname)  # name标签结束
 
            pose = xmlBuilder.createElement("pose")  # pose标签
            posecontent = xmlBuilder.createTextNode("Unspecified")
            pose.appendChild(posecontent)
            object.appendChild(pose)  # pose标签结束
 
            truncated = xmlBuilder.createElement("truncated")  # truncated标签
            truncatedContent = xmlBuilder.createTextNode("0")
            truncated.appendChild(truncatedContent)
            object.appendChild(truncated)  # truncated标签结束
 
            difficult = xmlBuilder.createElement("difficult")  # difficult标签
            difficultcontent = xmlBuilder.createTextNode("0")
            difficult.appendChild(difficultcontent)
            object.appendChild(difficult)  # difficult标签结束
 
            bndbox = xmlBuilder.createElement("bndbox")  # bndbox标签
            xmin = xmlBuilder.createElement("xmin")  # xmin标签
            mathData = int(((float(oneline[1])) * Pwidth + 1) - (float(oneline[3])) * 0.5 * Pwidth)
            xminContent = xmlBuilder.createTextNode(str(mathData))
            xmin.appendChild(xminContent)
            bndbox.appendChild(xmin)  # xmin标签结束
 
            ymin = xmlBuilder.createElement("ymin")  # ymin标签
            mathData = int(((float(oneline[2])) * Pheight + 1) - (float(oneline[4])) * 0.5 * Pheight)
            yminContent = xmlBuilder.createTextNode(str(mathData))
            ymin.appendChild(yminContent)
            bndbox.appendChild(ymin)  # ymin标签结束
 
            xmax = xmlBuilder.createElement("xmax")  # xmax标签
            mathData = int(((float(oneline[1])) * Pwidth + 1) + (float(oneline[3])) * 0.5 * Pwidth)
            xmaxContent = xmlBuilder.createTextNode(str(mathData))
            xmax.appendChild(xmaxContent)
            bndbox.appendChild(xmax)  # xmax标签结束
 
            ymax = xmlBuilder.createElement("ymax")  # ymax标签
            mathData = int(((float(oneline[2])) * Pheight + 1) + (float(oneline[4])) * 0.5 * Pheight)
            ymaxContent = xmlBuilder.createTextNode(str(mathData))
            ymax.appendChild(ymaxContent)
            bndbox.appendChild(ymax)  # ymax标签结束
 
            object.appendChild(bndbox)  # bndbox标签结束
 
            annotation.appendChild(object)  # object标签结束
 
        f = open(xmlPath + name[0:-4] + ".xml", 'w')
        xmlBuilder.writexml(f, indent='\t', newl='\n', addindent='\t', encoding='utf-8')
        f.close()
 
if __name__ == "__main__":
    picPath = "VOCdevkit/VOC2007/JPEGImages/"  # 图片所在文件夹路径，后面的/一定要带上
    txtPath = "VOCdevkit/VOC2007/YOLO/"  # txt所在文件夹路径，后面的/一定要带上
    xmlPath = "VOCdevkit/VOC2007/Annotations/"  # xml文件保存路径，后面的/一定要带上
    makexml(picPath, txtPath, xmlPath)

4.2 转换为VOC(XML)格式

VOCdevkit

–VOC2007

----Annotations

----ImageSets

------Main

----JEPGImages

Annotations 目录存放.xml文件，JEPGImages 存放训练图片，划分数据集使用以下代码，

VOC Annotations文件夹，该文件下存放的是xml格式的标签文件，每个xml文件都对应于JPEGImages文件夹的一张图片, 其中对xml的解析如下:

<annotation>  
    <folder>VOC2007</folder>                             
    <filename>2007_000392.jpg</filename>                               //文件名  
    <source>                                                           //图像来源（不重要）  
        <database>The VOC2007 Database</database>  
        <annotation>PASCAL VOC2007</annotation>  
        <image>flickr</image>  
    </source>  
    <size>                                               //图像尺寸（长宽以及通道数）                        
        <width>500</width>  
        <height>332</height>  
        <depth>3</depth>  
    </size>  
    <segmented>1</segmented>                                   //是否用于分割（在图像物体识别中01无所谓）  
    <object>                                                           //检测到的物体  
        <name>horse</name>                                         //物体类别  
        <pose>Right</pose>                                         //拍摄角度  
        <truncated>0</truncated>                                   //是否被截断（0表示完整）  
        <difficult>0</difficult>                                   //目标是否难以识别（0表示容易识别）  
        <bndbox>                                                   //bounding-box（包含左下角和右上角xy坐标）  
            <xmin>100</xmin>  
            <ymin>96</ymin>  
            <xmax>355</xmax>  
            <ymax>324</ymax>  
        </bndbox>  
    </object>  
    <object>                                                           //检测到多个物体  
        <name>person</name>  
        <pose>Unspecified</pose>  
        <truncated>0</truncated>  
        <difficult>0</difficult>  
        <bndbox>  
            <xmin>198</xmin>  
            <ymin>58</ymin>  
            <xmax>286</xmax>  
            <ymax>197</ymax>  
        </bndbox>  
    </object>  
</annotation>

下面是visDrone2019的txt注释文件转换为voc xml的代码，visDrone2019_txt2xml_voc.py
需要改的地方有注释，就是几个路径改一下即可。

'''
Author: 刘鸿燕 13752614153@163.com
Date: 2022-05-09 10:17:40
LastEditors: 刘鸿燕 13752614153@163.com
LastEditTime: 2022-05-09 11:17:20
FilePath: \VisDrone2019\data_process\visDrone2019_txt2xml.py
Description: 这是默认设置,请设置`customMade`, 打开koroFileHeader查看配置 进行设置: https://github.com/OBKoro1/koro1FileHeader/wiki/%E9%85%8D%E7%BD%AE
'''
import os
import datetime
from PIL import Image
from pathlib import Path
FILE = Path(__file__).resolve()
# print("FILE",FILE)
ROOT = FILE.parent.parents[0] # root directory
print("ROOT",ROOT)

def check_dir(path):
    if os.path.isdir(path):
        print("{}文件路径存在！".format(path))
        pass
    else:
        os.makedirs(path)
        print("{}文件路径创建成功！".format(path))

#把下面的root_dir路径改成你自己的路径即可
root_dir = ROOT / 'VisDrone2019-DET-train'
annotations_dir = root_dir / "annotations/" 
image_dir = root_dir / "images/"
xml_dir = root_dir / "Annotations_XML/"   #在工作目录下创建Annotations_XML文件夹保存xml文件
check_dir(xml_dir)
# print("annotation_dir",annotations_dir)
# print("image_dir",image_dir)
# print("xml_dir",xml_dir)

# root_dir = r"D:\object_detection_data\datacovert\VisDrone2019-DET-val/"   
# annotations_dir = root_dir+"annotations/"
# image_dir = root_dir + "images/"
# xml_dir = root_dir+"Annotations_XML/"   #在工作目录下创建Annotations_XML文件夹保存xml文件

# 下面的类别也换成你自己数据类别，也可适用于其他的数据集转换
class_name = ['ignored regions','pedestrian','people','bicycle','car','van',
    'truck','tricycle','awning-tricycle','bus','motor','others']

for filename in os.listdir(annotations_dir):
    fin = open(annotations_dir/ filename, 'r')
    image_name = filename.split('.')[0]
    image_path=Path(image_dir).joinpath(image_name+".jpg")# 若图像数据是“png”转换成“.png”即可
    img = Image.open(image_path) # 若图像数据是“png”转换成“.png”即可
    xml_name = Path(xml_dir).joinpath(image_name+'.xml')
    with open(xml_name, 'w') as fout:
        #写入的xml基本信息
        fout.write(''+'\n')
        fout.write('\t'+'VOC2007'+'\n')
        fout.write('\t'+''+image_name+'.jpg'+''+'\n')
        
        fout.write('\t'+''+'\n')
        fout.write('\t\t'+''+'VisDrone2019-DET'+''+'\n')
        fout.write('\t\t'+''+'VisDrone2019-DET'+''+'\n')
        fout.write('\t\t'+''+'flickr'+''+'\n')
        fout.write('\t\t'+''+'Unspecified'+''+'\n')
        fout.write('\t'+''+'\n')
        
        fout.write('\t'+''+'\n')
        fout.write('\t\t'+''+'LJ'+''+'\n')
        fout.write('\t\t'+''+'LJ'+''+'\n')
        fout.write('\t'+''+'\n')
        
        fout.write('\t'+''+'\n')
        fout.write('\t\t'+''+str(img.size[0])+''+'\n')
        fout.write('\t\t'+''+str(img.size[1])+''+'\n')
        fout.write('\t\t'+''+'3'+''+'\n')
        fout.write('\t'+''+'\n')
        
        fout.write('\t'+''+'0'+''+'\n')

        for line in fin.readlines():
            line = line.split(',')
            fout.write('\t'+''+'\n')
             
        fin.close()
        fout.write('')

1、转化为voc格式数据集的数据标签可视化

'''
为了方便实验对比结果，先将visdrone的验证集可视化出来。
代码是根据xml标签可视化的，需要将visdrone的txt标签转成xml
'''
import os
import os.path
import numpy as np
import xml.etree.ElementTree as xmlET
from PIL import Image, ImageDraw

#'1': 'people', '2': 'people','3': 'bicycle', '4': 'car', '5': 'car',
# 6':'others','7':'others','8':'others','9':'others','10': 'motor','11':'others'

classes = ('__background__', # always index 0
           'ignored regions','pedestrian', 'people','bicycle','car','van','truck','tricycle','awning-tricycle',
           'bus','motor','others')

#把下面的路径改为自己的路径即可
file_path_img = r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019\VisDrone2019-DET-val\images'
file_path_xml = r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019\VisDrone2019-DET-val\Annotations_XML'
save_file_path = r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019\VisDrone2019-DET-val\Annotations_XML_show'

pathDir = os.listdir(file_path_xml)
for idx in range(len(pathDir)):
    filename = pathDir[idx]#xml文件名
    tree = xmlET.parse(os.path.join(file_path_xml, filename))#解析xml
    objs = tree.findall('object')
    num_objs = len(objs)
    boxes = np.zeros((num_objs, 5), dtype=np.uint16)

    for ix, obj in enumerate(objs):
        bbox = obj.find('bndbox')
        # Make pixel indexes 0-based
        x1 = float(bbox.find('xmin').text)
        y1 = float(bbox.find('ymin').text)
        x2 = float(bbox.find('xmax').text)
        y2 = float(bbox.find('ymax').text)

        cla = obj.find('name').text
        label = classes.index(cla)

        boxes[ix, 0:4] = [x1, y1, x2, y2]
        boxes[ix, 4] = label

    image_name = os.path.splitext(filename)[0]
    img = Image.open(os.path.join(file_path_img, image_name + '.jpg'))

    draw = ImageDraw.Draw(img)
    for ix in range(len(boxes)):
        xmin = int(boxes[ix, 0])
        ymin = int(boxes[ix, 1])
        xmax = int(boxes[ix, 2])
        ymax = int(boxes[ix, 3])
        draw.rectangle([xmin, ymin, xmax, ymax], outline=(255, 0, 0))
        draw.text([xmin, ymin], classes[boxes[ix, 4]], (255, 0, 0))

    img.save(os.path.join(save_file_path, image_name + '.png'))

2、标签为voc格式数据集划分训练集和验证集

import os
import random

trainval_percent = 0.8
train_percent = 0.8
xmlfilepath = 'Annotations'
txtsavepath = 'ImageSets\Main'
total_xml = os.listdir(xmlfilepath)

num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)

ftrainval = open('ImageSets/Main/trainval.txt', 'w')
ftest = open('ImageSets/Main/test.txt', 'w')
ftrain = open('ImageSets/Main/train.txt', 'w')
fval = open('ImageSets/Main/val.txt', 'w')

for i in list:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

上述代码分割数据集，训练集占80%，测试集占20% 运行代码后可在/VOCdevkit/VOC2007/ImageSets/Main看见三个.txt文件
三个.txt文件里面分别是训练测试图片名称的索引，数据集准备到这就完成了

4.3 转换为VOC(XML)TO COCO(JSON)数据格式

参考博客：
将visdrone数据集转化为coco格式并在mmdetection上训练,附上转好的json文件

1、coco目标检测的json整体格式

对于目标检测，json文件的格式主要如下：

是的，你打开这两个文件，虽然内容很多，但从文件开始到结尾按照顺序就是这5段。其中，info、licenses、images在不同的JSON文件中这三个类型是一样的，定义是共享的。
不共享的是annotation和category这两种结构体，他们在不同类型的JSON文件中是不一样的。
PS，images数组、annotations数组、categories数组的元素数量是相等的，等于图片的数量。

每个对象实例注释包含一系列字段，包括对象的类别id和分段掩码。分割格式取决于实例是否代表单个对象(iscrowd=0，在使用多边形的情况下)或对象集合(iscrowd=1，在使用RLE的情况下)。请注意，单个对象(iscrowd=0)可能需要多个多边形，例如遮挡。人群注释(iscrowd=1)用于标记大型对象组(例如一群人)。此外，为每个对象提供了一个封闭的边界框(框的坐标是从图像左上角开始测量的，并且是o索引的)。
最后，注释结构的categories字段存储类别id到类别和超类别名称的映射。

2、coco共享的数据基本结构

这3种类型共享下面所列的基本类型，包括info、image、license。

{
"info" : info,
"images" : [image], 
"annotations" : [annotation],
"licenses" : [license],
}

info{
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime,
}

image{
"id" : int,
"width" : int,
"height" : int,
"file_name" : str,
"license" : int,
"flickr_url" : str,
"coco_url" : str,
"date_captured" : datetime,
}

license{
"id" : int,
"name" : str,
"url" : str,
}

info

info: {
    "year": int,# 年份
    "version": str,# 版本
    "description": str, # 数据集描述
    "contributor": str,# 提供者
    "url": str,# 下载地址
    "date_created": datetime
}

info类型，比如一个info类型的实例：

"info":{
	"description":"This is stable 1.0 version of the 2014 MS COCO dataset.",
	"url":"http:\/\/mscoco.org",
	"version":"1.0","year":2014,
	"contributor":"Microsoft COCO group",
	"date_created":"2015-01-27 09:11:52.357475"
}

license

license{
    "id": int,
    "name": str,
    "url": str,
}

{
	"url":"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/2.0\/",
	"id":1,
	"name":"Attribution-NonCommercial-ShareAlike License"
}

image

image{
    "id": int,# 图片的ID编号（每张图片ID是唯一的）
    "width": int,#宽
    "height": int,#高
    "file_name": str,# 图片名
    "license": int,
    "flickr_url": str,# flickr网路地址
    "coco_url": str,# 网路地址路径
    "date_captured": datetime # 数据获取日期
}

Images是包含多个image实例的数组，对于一个image类型的实例：

{
	"license":3,
	"file_name":"COCO_val2014_000000391895.jpg",
	"coco_url":"http:\/\/mscoco.org\/images\/391895",
	"height":360,"width":640,"date_captured":"2013-11-14 11:18:45",
	"flickr_url":"http:\/\/farm9.staticflickr.com\/8186\/8119368305_4e622c8349_z.jpg",
	"id":391895
}

每一个image的实例是一个dict。其中有一个id字段，代表的是图片的id，每一张图片具有唯一的一个独特的id。

3、coco不共享的数据基本结构

annotations字段

annotations字段是包含多个annotation实例的一个列表，annotation类型本身又包含了一系列的字段，如这个目标的category id和segmentation mask。segmentation格式取决于这个实例是一个单个的对象（即iscrowd=0，将使用polygons格式）还是一组对象（即iscrowd=1，将使用RLE格式）。如下所示：

annotation{
    "id": int,
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1,
}

annotation{
    "id": int, # 对象ID，因为每一个图像有不止一个对象，所以要对每一个对象编号（每个对象的ID是唯一的）
    "image_id": int,# 对应的图片ID（与images中的ID对应）
    "category_id": int,# 类别ID（与categories中的ID对应）
    "segmentation": RLE or [polygon],# 对象的边界点（边界多边形，此时iscrowd=0）。
    #segmentation格式取决于这个实例是一个单个的对象（即iscrowd=0，将使用polygons格式）还是一组对象（即iscrowd=1，将使用RLE格式）
    "area": float,# 区域面积
    "bbox": [x,y,width,height], # 定位边框 [x,y,w,h]
    "iscrowd": 0 or 1 #见下
}

注意，单个的对象（iscrowd=0)可能需要多个polygon来表示，比如这个对象在图像中被挡住了。而iscrowd=1时（将标注一组对象，比如一群人）的segmentation使用的就是RLE格式。
另外，每个对象（不管是iscrowd=0还是iscrowd=1）都会有一个矩形框bbox ，矩形框左上角的坐标和矩形框的长宽会以数组的形式提供，数组第一个元素就是左上角的横坐标值。
其中，area是框的面积（area of encoded masks）。

一个annotation为segmentation是polygon格式的实例：

{
	"segmentation": [[510.66,423.01,511.72,420.03,510.45......]],
	"area": 702.1057499999998,
	"iscrowd": 0,
	"image_id": 289343,
	"bbox": [473.07,395.93,38.65,28.67],
	"category_id": 18,
	"id": 1768
}

categories字段

categories是一个包含多个category实例的列表，而一个category结构体描述如下：

{
	"supercategory": str,# 主类别
    "id": int,# 类对应的id （0 默认为背景）
    "name": str # 子类别
}

categories类型实例：

{
	"supercategory": "person",
	"id": 1,
	"name": "person"
},
{
	"supercategory": "vehicle",
	"id": 2,
	"name": "bicycle"
}

4、代码转换

# coco数据标注的基本格式
'''
{
    "info" : info, 
    "images" : [image],
    "annotations" : [annotation],
    "licenses" : [license],
}

info {
    "year" : int,
    "version" : str,
    "description" : str,
    "contributor" : str,
    "url" : str,
    "date_created" : datetime,
}


license{
    "id" : int,
    "name" : str,
    "url" : str,
}

image{
    "id" : int, # 图片id
    "width" : int, # 图片宽
    "height" : int, # 图片高
    "file_name" : str, # 图片名
    
    "license" : int,
    "flickr_url" : str,
    "coco_url" : str, # 图片链接
    "date_captured" : datetime, # 图片标注时间
}


annotation{
    "id" : int,
    "image_id" : int,
    "category_id" : int,
    "segmentation" : RLE or [polygon],
    "area" : float, 
    "bbox" : [x,y,width,height],
    "iscrowd" : 0 or 1,
}

categories[{
    "id" : int,
    "name" : str,
    "supercategory" : str,
}]'''


import sys, os, json, glob
import xml.etree.ElementTree as ET
from pathlib import Path
from xml.dom import minidom

#检测框的ID起始值
INITIAL_BBOXIds = 0
# #类别列表无必要预先创建，程序中会根据所有图像中包含的ID来创建并更新
# PREDEF_CLASSE = {}
PREDEF_CLASSE = { 'ignored regions':0,'pedestrian': 1, 'people': 2,'bicycle': 3, 'car': 4, 'van': 5, 'truck': 6, 'tricycle': 7,'awning-tricycle': 8, 'bus': 9, 'motor': 10,'others':11}
'''
#我这里只想检测这十个类， 0和11没有加入转化。
PREDEF_CLASSE = { 'pedestrian': 1, 'people': 2,'bicycle': 3, 'car': 4, 'van': 5, 'truck': 6, 'tricycle': 7,'awning-tricycle': 8, 'bus': 9, 'motor': 10}
#class_name = ['ignored regions','pedestrian','people','bicycle','car','van','truck','tricycle','awning-tricycle','bus','motor','others']
'''
def check_dir(path):
    if os.path.isdir(path):
        print("{}文件路径存在！".format(path))
        pass
    else:
        os.makedirs(path)
        print("{}文件路径创建成功！".format(path))

def get(root, name):
    return root.findall(name)

def get_and_check(root, name, length):
    vars = root.findall(name)
    if len(vars) == 0:
        raise NotImplementedError('Can not find %s in %s.'%(name, root.tag))
    if length > 0 and len(vars) != length:
        raise NotImplementedError('The size of %s is supposed to be %d, but is %d.'%(name, length, len(vars)))
    if length == 1:
        vars = vars[0]
    return vars

def convert(xml_paths, out_json):
    #json的基本标注格式
    json_dict = {'images': [], 'type':"instances",'annotations': [],'categories': []}
    
    categories = PREDEF_CLASSE
    bbox_id = INITIAL_BBOXIds
    
    for image_id, xml_f in enumerate(xml_paths):   
        # 进度输出
        sys.stdout.write('\r>> Converting image %d/%d' % (image_id + 1, len(xml_paths)))
        sys.stdout.flush()

        tree = ET.parse(xml_f)
        root = tree.getroot()
        path=get(root,'path')
        
        if len(path)==1:
            filename = os.path.basename(path[0].text)
        elif len(path) == 0:
            filename = get_and_check(root, 'filename', 1).text
        else:
            raise NotImplementedError('%d paths found in %s'%(len(path), xml_f))

        #images属性
        size = get_and_check(root, 'size', 1)
        #图片的基本信息
        width = int(get_and_check(size, 'width', 1).text)
        height = int(get_and_check(size, 'height', 1).text)
        image = {
            'id': image_id + 1, 
            'height': height, 
            'width': width, 
            'file_name': filename
            }
        json_dict['images'].append(image)

        for obj in get(root, 'object'):
            category = get_and_check(obj, 'name', 1).text
            #更新类别ID字典
            if category not in categories:
                new_id = len(categories)
                categories[category] = new_id
            category_id = categories[category]
           
            bbox = get_and_check(obj, 'bndbox', 1)
            xmin = int(get_and_check(bbox, 'xmin', 1).text) - 1
            ymin = int(get_and_check(bbox, 'ymin', 1).text) - 1
            xmax = int(get_and_check(bbox, 'xmax', 1).text)
            ymax = int(get_and_check(bbox, 'ymax', 1).text)
            if xmax <= xmin or ymax <= ymin:
                continue
            o_width = abs(xmax - xmin)
            o_height = abs(ymax - ymin)
            # ann = {'area': o_width * o_height, 'iscrowd': 0, 'image_id': image_id + 1,
            #     'bbox': [xmin, ymin, o_width, o_height], 'category_id': category_id, 
            #     'id': bbox_id, 'ignore': 0, 'segmentation': [xmin,ymin,xmin,ymax,xmax,ymax,xmax,ymin]}
            ann={
                'id':bbox_id,
                'image_id': image_id,
                'category_id': category_id, 
                # 'segmentation': [xmin,ymin,xmin,ymax,xmax,ymax,xmax,ymin],
                'area': o_width * o_height,
                'bbox': [xmin, ymin, o_width, o_height],
                'iscrowd': 0
            }
            json_dict['annotations'].append(ann)
            bbox_id = bbox_id + 1
        
    
    # 写入类别ID字典
    for cate, cid in categories.items():
        cat = {'supercategory': 'none', 'id': cid, 'name': cate}
        json_dict['categories'].append(cat)
        
    # json_file = open(out_json, 'w')
    # json_str = json.dumps(json_dict)
    # json_file.write(json_str)
    # json_file.close() # 快
    
    json.dump(json_dict, open(out_json, 'w'), indent=4)  # indent=4 更加美观显示 慢
    print("json file write done...")

if __name__ == '__main__':
    dir = Path(r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019')  # dataset文件夹下Visdrone2019文件夹路径
    # for d in 'VisDrone2019-DET-train', 'VisDrone2019-DET-val', 'VisDrone2019-DET-test-dev':
    for d in 'VisDrone2019-DET-train','VisDrone2019-DET-val':
        xml_dir=dir / d / 'Annotations_XML'
        coco_dir=dir / d/ 'Annotations_COCO'
        check_dir( coco_dir)
        xml_file = glob.glob(os.path.join(xml_dir, '*.xml'))
        json_file=coco_dir / 'instances_{}2017.json'.format(d.split('-')[2])
        # convrt Annotations_COCO
        convert(xml_file, json_file)  #这里是生成的json保存位置，改一下

4.4 直接由txtT转换为 COCO(JSON)数据格式

import os
import cv2
from PIL import Image
from tqdm import tqdm
import json

def convert_to_cocodetection(dir, output_dir):
    
    #数据目录
    train_dir = os.path.join(dir, "VisDrone2019-DET-train")
    val_dir = os.path.join(dir, "VisDrone2019-DET-val")
    #数据标注目录
    train_annotations = os.path.join(train_dir, "annotations")
    val_annotations = os.path.join(val_dir, "annotations")
    #数据影像目录
    train_images = os.path.join(train_dir, "images")
    val_images = os.path.join(val_dir, "images")
    
    id_num = 0
 
    categories = [
        {"id": 0, "name": "ignored regions"},
        {"id": 1, "name": "pedestrian"},
        {"id": 2, "name": "people"},
        {"id": 3, "name": "bicycle"},
        {"id": 4, "name": "car"},
        {"id": 5, "name": "van"},
        {"id": 6, "name": "truck"},
        {"id": 7, "name": "tricycle"},
        {"id": 8, "name": "awning-tricycle"},
        {"id": 9, "name": "bus"},
        {"id": 10, "name": "motor"},
        {"id": 11, "name": "others"}
    ]
    
    for mode in ["train", "val"]:
        images = []
        annotations = []

        print(f"start loading {mode} data...")
        if mode == "train":
            set = os.listdir(train_annotations)
            annotations_path = train_annotations
            images_path = train_images
        else:
            set = os.listdir(val_annotations)
            annotations_path = val_annotations
            images_path = val_images
        
        for i in tqdm(set):
            f = open(annotations_path + "/" + i, "r")
            name = i.replace(".txt", "")
            
            #images属性
            image = {}
            image_file_path=images_path + os.sep + name + ".jpg"
            print(image_file_path)

            img_size = Image.open((images_path + os.sep + name+ ".jpg")).size
            width,height=img_size
            # height, width = cv2.imread(images_path + os.sep + name + ".jpg").shape[:2]
            file_name = name + ".jpg"
            image["id"] = name
            image["height"] = height
            image["width"] = width
            image["file_name"] = file_name
            images.append(image)
            
            for line in f.readlines():
                #annotation属性
                annotation = {}
                line = line.replace("\n", "")
                if line.endswith(","):  # filter data
                    line = line.rstrip(",")
                line_list = [int(i) for i in line.split(",")]
                bbox_xywh = [line_list[0], line_list[1], line_list[2], line_list[3]]
                annotation["id"] = id_num
                annotation["image_id"] = name
                annotation["category_id"] = int(line_list[5])
                # annotation["segmentation"] = []
                annotation["area"] = bbox_xywh[2] * bbox_xywh[3]
                # annotation["score"] = line_list[4]
                annotation["bbox"] = bbox_xywh
                annotation["iscrowd"] = 0
                id_num += 1
                annotations.append(annotation)
        
        dataset_dict = {}
        dataset_dict["images"] = images
        dataset_dict["annotations"] = annotations
        dataset_dict["categories"] = categories
        json_str = json.dumps(dataset_dict)
        with open(f'{output_dir}/VisDrone2019-DET_{mode}_coco.json', 'w') as json_file:
            json_file.write(json_str)
    print("json file write done...")
 
 
def get_test_namelist(dir, out_dir):
    full_path = out_dir + "/" + "test.txt"
    file = open(full_path, 'w')
    for name in tqdm(os.listdir(dir)):
        name = name.replace(".txt", "")
        file.write(name + "\n")
    file.close()
    return None
 
 
def centerxywh_to_xyxy(boxes):
    """
    args:
        boxes:list of center_x,center_y,width,height,
    return:
        boxes:list of x,y,x,y,cooresponding to top left and bottom right
    """
    x_top_left = boxes[0] - boxes[2] / 2
    y_top_left = boxes[1] - boxes[3] / 2
    x_bottom_right = boxes[0] + boxes[2] / 2
    y_bottom_right = boxes[1] + boxes[3] / 2
    return [x_top_left, y_top_left, x_bottom_right, y_bottom_right]
 
 
def centerxywh_to_topleftxywh(boxes):
    """
    args:
        boxes:list of center_x,center_y,width,height,
    return:
        boxes:list of x,y,x,y,cooresponding to top left and bottom right
    """
    x_top_left = boxes[0] - boxes[2] / 2
    y_top_left = boxes[1] - boxes[3] / 2
    width = boxes[2]
    height = boxes[3]
    return [x_top_left, y_top_left, width, height]
 
 
def clamp(coord, width, height):
    if coord[0] < 0:
        coord[0] = 0
    if coord[1] < 0:
        coord[1] = 0
    if coord[2] > width:
        coord[2] = width
    if coord[3] > height:
        coord[3] = height
    return coord
 
 
if __name__ == '__main__':
    # 第一个参数输入上面目录的路径，第二个参数是要输出的路径
    # 只添加了检测训练必要的数据，COCO格式多余的数据都设为空
    convert_to_cocodetection(r"E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019",r"E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019\VisDrone2019-DET-COCO\annotations")

5、VisDrone2019目标检测数据集coco格式目录整理

VisDrone2019的目标检测的COCO形式的数据集目录如下图所示
annotations数据集目录内容如下图所示

6、 VisDrone2019目标检测数据集coco格式数据浏览browse_dataset

给你一个新的目标检测项目，转化为coco格式，设置好cfg后，难道不需要看下label和bbox是否正确？不需要看下数据增强策略是否合适？我想作为一个有经验的工程师必然少不了这个步骤。

故browse_dataset可以对datasets吐出的数据进行可视化检查，看下是否有错误。这个工具我是直接从mmdetection里面copy过来的，并修复了在voc那种数据的配置上面出错的bug。

用法非常简单，只需要传入cfg文件即可，以coco数据为例,如下所示：

Tools /misc/browse_data .py帮助用户可视化地浏览检测数据集(包括图像和边界框注释)，或者将图像保存到指定目录

python tools/misc/browse_dataset.py ${CONFIG} [-h] [--skip-type ${SKIP_TYPE[SKIP_TYPE...]}] [--output-dir ${OUTPUT_DIR}] [--not-show] [--show-interval ${SHOW_INTERVAL}]

可视化数据集标签 – browse_dataset.py
一般训练前放好数据集和设置好相应的配置文件之后，需要先看看自己数据集标签这块对着没。可运行如下命令以faster_rcnn为例
示例：

python  tools/misc/browse_dataset.py   configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py

你可能感兴趣的:(MMdetection,深度学习,COCO数据处理,深度学习,目标检测,计算机视觉)

集中式架构vs分布式架构谦亨有终架构架构分布式
一、集中式架构如何准确理解集中式架构1.集中式架构的定义集中式架构是一种将系统的所有计算、存储、数据处理和控制逻辑集中在一个或少数几个节点上运行的架构模式。这些中央节点（服务器或主机）作为系统的核心，负责处理所有用户请求和业务逻辑，客户端只负责请求和展示。2.核心特性单一控制中心：所有服务和资源都由中央节点统一管理。资源集中管理：数据和计算资源位于同一位置，便于维护和扩展。高一致性：由于资源集中管
国外7个最佳大语言模型 (LLM) API推荐程序员后端
大型语言模型(LLM)API将彻底改变我们处理语言的方式。在深度学习和机器学习算法的支持下，LLMAPI提供了前所未有的自然语言理解能力。通过利用这些新的API，开发人员现在可以创建能够以前所未有的方式理解和响应书面文本的应用程序。下面，我们将比较从Bard到ChatGPT、PaLM等市场上顶级LLMAPI。我们还将探讨整合这些LLM的潜在用例，并考虑其对语言处理的影响。什么是大语言模型(LLM)
Ubuntu22.4.03服务器版安装及搭建深度学习环境的问题总结蜡笔小祎在线学习问题集合深度学习人工智能
Ubuntu22.4.03服务器版安装流程整个流程已经有很多分享帖了，这里概述一下：下载iso制作启动U盘，按f2进入安装，选择语言，键盘布局english，ubuntuserver安装，DHCP自动配置网络（问题1），代理服务器我们没填，配置阿里云镜源http://mirrors.aliyun.com/ubuntu/，磁盘分区（问题2），设置服务器密码，安装ssh远程工具，重启reboot。可参
Vision Transformer（ViT）：用 Transformer 颠覆图像识别金外飞176 论文精读 transformer 深度学习人工智能
VisionTransformer（ViT）：用Transformer颠覆图像识别在计算机视觉领域，卷积神经网络（CNN）长期以来一直是图像识别任务的主流架构。然而，近年来，自然语言处理（NLP）领域中大放异彩的Transformer架构也开始在图像识别中崭露头角。今天，我们将深入探讨一种创新的架构——VisionTransformer（ViT），它将Transformer的强大能力直接应用于图像
基于Transformer的YOLOv8检测头架构改进：提升目标检测精度的全新突破（YOLOv8）步入烟尘 transformer YOLO 目标检测
本专栏专为AI视觉领域的爱好者和从业者打造。涵盖分类、检测、分割、追踪等多项技术，带你从入门到精通！后续更有实战项目，助你轻松应对面试挑战！立即订阅，开启你的YOLOv8之旅！专栏订阅地址：https://blog.csdn.net/mrdeam/category_12804295.html文章目录基于Transformer的YOLOv8检测头架构改进：提升目标检测精度的全新突破什么是DAtten
【深度学习目标检测|YOLO算法5-1-1】YOLO家族进化史：从YOLOv1到YOLOv11的架构创新、性能优化与行业应用全解析... 985小水博一枚呀论文解读深度学习目标检测 YOLO 人工智能算法架构网络
【深度学习目标检测|YOLO算法5-1-1】YOLO家族进化史：从YOLOv1到YOLOv11的架构创新、性能优化与行业应用全解析…【深度学习目标检测|YOLO算法5-1-1】YOLO家族进化史：从YOLOv1到YOLOv11的架构创新、性能优化与行业应用全解析…文章目录【深度学习目标检测|YOLO算法5-1-1】YOLO家族进化史：从YOLOv1到YOLOv11的架构创新、性能优化与行业应用全解
YOLOv8到YOLOv11：深度解析目标检测架构的演进金外飞176 技术前沿目标跟踪人工智能计算机视觉目标检测 YOLO 神经网络深度学习
YOLOv8到YOLOv11：深度解析目标检测架构的演进在计算机视觉领域，YOLO（YouOnlyLookOnce）系列模型一直是实时目标检测领域的佼佼者。从2015年的YOLOv1到2024年的YOLOv11，这一系列模型经历了快速的迭代和发展，不断刷新着目标检测的性能和效率。然而，由于部分YOLO版本缺乏详细的学术论文和架构图，研究人员和开发者在理解这些模型的工作原理时往往面临挑战。最近，一篇
探索A10技术的应用与未来发展潜力智能计算研究中心其他
内容概要A10技术是一项正在逐步成熟并对多个行业产生深远影响的前沿技术。其发展历程可以追溯到早期的研发阶段，至今已经经过了多次技术迭代与升级。以下是对A10技术核心应用和优势的概述，通过这些内容可以帮助读者更好地理解其用途：应用领域具体应用主要优势信息技术数据处理与分析提高数据处理效率制造业自动化与智能生产降低生产成本医疗行业远程监控与智能诊断提升医疗服务质量交通运输智能交通系统优化交通流量环保领
Python从0到100（三十九）：数据提取之正则（文末免费送书）是Dream呀 python mysql 开发语言
前言：零基础学Python：Python从0到100最新最全教程。想做这件事情很久了，这次我更新了自己所写过的所有博客，汇集成了Python从0到100，共一百节课，帮助大家一个月时间里从零基础到学习Python基础语法、Python爬虫、Web开发、计算机视觉、机器学习、神经网络以及人工智能相关知识，成为学习学习和学业的先行者！欢迎大家订阅专栏：零基础学Python：Python从0到100最新
清华大学第四发《DeepSeek+DeepResearch 让科研像聊天一样简单》人工智能
当下科研领域，传统模式急需改变，清华大学第四版《DeepSeek+DeepResearch：让科研像聊天一样简单》全文一共86页，以下是文档的关键内容总结：一、智能组合优势DeepSeek与DeepResearch构建先进技术体系，有强大模型运算、智能数据处理和友好交互界面。模型在数据处理速度、精准度和泛化能力上远超传统模型。数据采集渠道广、处理快，能读取多种格式文件。数据分析深入，可视化直观，还
HarmonyOS Next智能家居控制系统的模型转换与数据处理实战 harmonyos
本文旨在深入探讨基于华为鸿蒙HarmonyOSNext系统（截止目前API12）构建智能家居控制系统中模型转换与数据处理技术的实战应用，基于实际开发经验进行总结。主要作为技术分享与交流载体，难免错漏，欢迎各位同仁提出宝贵意见和问题，以便共同进步。本文为原创内容，任何形式的转载必须注明出处及原作者。一、智能家居系统需求与技术选型（一）功能需求分析设备状态监测需求智能家居控制系统需要实时监测各种智能设
HarmonyOS Next数据处理与模型训练优化 harmonyos
本文旨在深入探讨华为鸿蒙HarmonyOSNext系统（截止目前API12）中数据处理与模型训练优化相关技术细节，基于实际开发实践进行总结。主要作为技术分享与交流载体，难免错漏，欢迎各位同仁提出宝贵意见和问题，以便共同进步。本文为原创内容，任何形式的转载必须注明出处及原作者。一、数据处理对模型训练的重要性（一）关键作用强调在HarmonyOSNext的模型训练世界里，数据就如同建筑的基石，而数据处
cocos creator从零开发简单框架(12)-代码生成单色Sprite cocos
在写Panel前，先写个方法生成单色Sprite，这样当碰到需要单色Sprite的时候不需要在编辑器拖拽和代码动态加载资源。编辑framework/scripts/AppUtil.ts，添加newSpriteNode方法。//生成默认白色100x100大小Sprite(单色)节点publicstaticnewSpriteNode(name:string='newSpriteNode'):cc.No
cocos creator从零开发简单框架(14)-Panel遮罩 cocos
遮罩相关属性编辑framework/scripts/view/PanelMgr.ts，增加遮罩相关成员变量及初始化方法。//所有面板privatestatic_panels:Map=newMap()privatestatic_maskName='_mask'privatestatic_maskPrefab:cc.Nodepublicstaticinit(){this._panels.clear()
cocos creator从零开发2048(06)-格子移动逻辑和键盘控制移动 cocos
编辑scripts/Game.ts，添加_moving属性标识当前是否移动中。privategridsReversed:Grid[][]=[]private_moving=false添加move方法移动格子。privatemove(grids:Grid[]){letlastIdx=grids.length-1letlastNum=grids[lastIdx].numfor(leti=grids.l
深度学习环境配置——Anaconda安装 tyyhmtyyhm 深度学习环境配置深度学习人工智能
目录Ⅰ.Windows系统安装Anaconda1.1下载安装Ⅱ.Linux系统安装Anaconda（适用于服务器安装）2.1下载2.2安装操作系统：windows11/ubuntu20/ubuntu18更新时间：20240221Ⅰ.Windows系统安装Anaconda1.1下载安装https://www.anaconda.com/download默认安装即可。Ⅱ.Linux系统安装Anacond
深度学习工厂的蓝图：拆解CUDA驱动、PyTorch与OpenCV的依赖关系时光旅人01号深度学习 pytorch opencv
想象一下，你正在建造一座深度学习工厂，这座工厂专门用于高效处理深度学习任务（如训练神经网络）和计算机视觉任务（如图像处理）。为了让工厂顺利运转，你需要搭建基础设施、安装设备、设置生产线，并配备控制台来管理整个生产过程。以下是这座工厂的详细构建过程：1.工厂的基础设施：Ubuntu比喻：Ubuntu是工厂所在的土地和建筑，提供了基础设施和运行环境。作用：提供操作系统环境，支持安装和运行各种工具和框架
FakeApp 技术浅析（一）爱研究的小牛 AIGC—深度伪造虚拟现实人工智能 AIGC 深度学习机器学习
FakeApp是一款早期的深度伪造（Deepfake）工具，最初于2018年发布，用于生成和编辑换脸视频。尽管FakeApp已经不再更新，但它在深度伪造技术的发展中起到了重要作用。1.技术背景与理论基础1.1生成对抗网络（GANs）生成对抗网络（GANs）是深度学习领域中的一种重要模型，由生成器（Generator）和判别器（Discriminator）组成。生成器负责生成逼真的数据（如图像、视频
DeepSeek 赋能工业软件之全流程方案爱吃青菜的大力水手人工智能自动化持续部署语言模型开源
deepseek赋能工业软件之全流程方案之侧重半导体FABdeepseek在工业软件中的应用场景“deepseek”大模型在工业软件领域拥有广泛的应用场景，包括以下几个方面：智能调度：利用深度学习和优化算法，根据实时数据动态调整生产计划和资源分配。它可以综合考虑订单需求、设备状态和产能限制，智能生成最优的生产排程方案，减少等待时间和切换成本。例如在汽车制造工厂，deepseek可根据订单需求和设备
在瑞芯微RK3588平台上使用RKNN部署YOLOv8Pose模型的C++实战指南机＿长 YOLO系列模型有效涨点改进深度学习落地实战 YOLO c++开发语言
在人工智能和计算机视觉领域，人体姿态估计是一项极具挑战性的任务，它对于理解人类行为、增强人机交互等方面具有重要意义。YOLOv8Pose作为YOLO系列中的新成员，以其高效和准确性在人体姿态估计任务中脱颖而出。本文将详细介绍如何在瑞芯微RK3588平台上，使用RKNN（RockchipNeuralNetworkToolkit）框架部署YOLOv8Pose模型，并进行C++代码的编译和运行。注本文全
深度学习之目标检测的常用标注工具铭瑾熙人工智能机器学习深度学习深度学习目标检测目标跟踪
1LabelImgLabelImg是一款开源的图像标注工具，标签可用于分类和目标检测，它是用Python编写的，并使用Qt作为其图形界面，简单好用。注释以PASCALVOC格式保存为XML文件，这是ImageNet使用的格式。此外，它还支持COCO数据集格式。2labelmelabelme是一款开源的图像/视频标注工具，标签可用于目标检测、分割和分类。灵感是来自于MIT开源的一款标注工具Label
34、深度学习-自学之路-深入理解-NLP自然语言处理-RNN一个简单的程序，可以从程序中理解RNN的基本思想。小宇爱深度学习-自学之路深度学习自然语言处理 rnn
importsys,random,mathfromcollectionsimportCounterimportnumpyasnpf=open('tasks_1-20_v1/en/qa1_single-supporting-fact_train.txt','r')raw=f.readlines()f.close()tokens=list()forlineinraw[0:1000]:tokens.ap
DeepSeek-R1 技术全景解析：从原理到实践的“炼金术配方” ——附多阶段训练流程图与核心误区澄清... 雪停时偶遇一叶春流程图
合集-人工智能(5)1.如何改进AI模型在特定环境中的知识检索2024-09-242.深度学习与统计学中的时间序列预测2024-10-033.《使用coze搭建一个会搜索、写ppt、思维导图的Agent》2024-10-294.深入浅出：Agent如何调用工具——从OpenAIFunctionCall到CrewAI框架01-145.DeepSeek-R1技术全景解析：从原理到实践的“炼金术配方”—
YOLOv8 Pose使用RKNN进行推理い不靠譜︶朱Sir 实用项目部署 YOLO 人工智能 python linux pip
关注微信公众号：朱sir的小站，发送202411081即可免费获取源代码下载链接一、简单介绍YOLOv8-Pose是一种基于YOLOv8架构的姿态估计模型，能够识别图像中的关键点位置，这些关键点通常表示人体的关节、特征点或其他显著位置。该模型在COCO关键点数据集上训练，适合多种姿势估计任务。二、ONNX推理1.首先需要先将Pytorch模型转换为Onnx模型，下载pt模型这里给出官方的权重下载地
利用Beautiful Soup和Pandas进行网页数据抓取与清洗处理实战傻啦嘿哟 pandas
目录一、准备工作二、抓取网页数据三、数据清洗四、数据处理五、保存数据六、完整代码示例七、总结在数据分析和机器学习的项目中，数据的获取、清洗和处理是非常关键的步骤。今天，我们将通过一个实战案例，演示如何利用Python中的BeautifulSoup库进行网页数据抓取，并使用Pandas库进行数据清洗和处理。这个案例不仅适合初学者，也能帮助有一定经验的朋友快速掌握这两个强大的工具。一、准备工作在开始之
改进YOLO系列 | YOLOv5/v7 引入 Dynamic Snake Convolution | 动态蛇形卷积 wei子 YOLO 目标跟踪人工智能
改进YOLO系列：动态蛇形卷积（DynamicSnakeConvolution，DSC）简介YOLO系列目标检测算法以其速度和精度著称，但对于细长目标例如血管、道路等，其性能仍有提升空间。动态蛇形卷积（DSC）是YOLOv5/v7中引入的一种改进，旨在更好地处理细长目标。DSC原理DSC的核心思想是使用类似蛇形运动的卷积核来提取细长目标的特征。具体来说，DSC卷积核沿着一系列控制点移动，并根据每个
十大经典排序算法的C++实现与解析金外飞176 算法算法数据结构 c++
经典排序算法的C++实现与解析在计算机科学中，排序算法是数据处理和算法设计的基础。无论是处理大规模数据还是优化小规模数据的性能，排序算法都扮演着重要角色。本文将介绍10种经典排序算法，并提供它们的C++实现代码。这些算法包括冒泡排序、选择排序、插入排序、希尔排序、归并排序、快速排序、堆排序、计数排序、基数排序和桶排序。1.冒泡排序（BubbleSort）原理冒泡排序是最简单的排序算法之一。它通过重
《神经网络与深度学习》(邱锡鹏) 内容概要【不含数学推导】 code_stream #机器学习神经网络
第1章绪论基本概念：介绍了人工智能的发展历程及不同阶段的特点，如符号主义、连接主义、行为主义等。还阐述了深度学习在人工智能领域的重要地位和发展现状，以及其在图像、语音、自然语言处理等多个领域的成功应用。术语解释人工智能：旨在让机器模拟人类智能的技术和科学。深度学习：一种基于对数据进行表征学习的方法，通过构建具有很多层的神经网络模型，自动从大量数据中学习复杂的模式和特征。第2章机器学习概述基本概念：
RHEL 安装 Hadoop 服务器 XhClojure hadoop 服务器大数据
在这篇文章中，我们将探讨如何在RedHatEnterpriseLinux(RHEL)上安装和配置Hadoop服务器。Hadoop是一个开源的分布式数据处理框架，用于处理大规模数据集。以下是在RHEL上安装Hadoop的详细步骤。步骤1：安装Java在安装Hadoop之前，我们需要确保系统上安装了JavaDevelopmentKit(JDK)。执行以下命令安装JDK：sudoyuminstallja
图像识别与应用狂踹瘸子那条好脚 python
图像识别作为人工智能领域的重要分支，近年来取得了显著进展，其中卷积神经网络（CNN）功不可没。CNN凭借其强大的特征提取能力，在图像分类、目标检测、人脸识别等任务中表现出色，成为图像识别领域的核心技术。一、卷积神经网络：图像识别的利器CNN是一种专门处理网格状数据的深度学习模型，其结构设计灵感来源于生物视觉系统。与全连接神经网络不同，CNN通过卷积层、池化层等结构，能够有效提取图像的局部特征，并逐
用MiddleGenIDE工具生成hibernate的POJO（根据数据表生成POJO类） AdyZhang POJO eclipse Hibernate MiddleGenIDE
推荐:MiddlegenIDE插件, 是一个Eclipse 插件. 用它可以直接连接到数据库, 根据表按照一定的HIBERNATE规则作出BEAN和对应的XML ，用完后你可以手动删除它加载的JAR包和XML文件! 今天开始试着使用
.9.png Cb123456 android
“点九”是andriod平台的应用软件开发里的一种特殊的图片形式，文件扩展名为：.9.png 　　智能手机中有自动横屏的功能,同一幅界面会在随着手机(或平板电脑)中的方向传感器的参数不同而改变显示的方向,在界面改变方向后,界面上的图形会因为长宽的变化而产生拉伸,造成图形的失真变形。　　我们都知道android平台有多种不同的分辨率，很多控件的切图文件在被放大拉伸后，边
算法的效率天子之骄算法效率复杂度最坏情况运行时间大O阶平均情况运行时间
算法的效率效率是速度和空间消耗的度量。集中考虑程序的速度，也称运行时间或执行时间，用复杂度的阶(O)这一标准来衡量。空间的消耗或需求也可以用大O表示，而且它总是小于或等于时间需求。以下是我的学习笔记： 1.求值与霍纳法则，即为秦九韶公式。 2.测定运行时间的最可靠方法是计数对运行时间有贡献的基本操作的执行次数。运行时间与这个计数成正比。
java数据结构何必如此 java 数据结构
Java 数据结构 Java工具包提供了强大的数据结构。在Java中的数据结构主要包括以下几种接口和类：枚举（Enumeration）位集合（BitSet）向量（Vector）栈（Stack）字典（Dictionary）哈希表（Hashtable）属性（Properties）以上这些类是传统遗留的，在Java2中引入了一种新的框架-集合框架(Collect
MybatisHelloWorld 3213213333332132
//测试入口TestMyBatis package com.base.helloworld.test; import java.io.IOException; import org.apache.ibatis.io.Resources; import org.apache.ibatis.session.SqlSession; import org.apache.ibat
Java|urlrewrite|URL重写|多个参数 7454103 java xml Web 工作
个人工作经验！如有不当之处，敬请指点 1.0 web -info 目录下建立 urlrewrite.xml 文件类似如下： <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE u
达梦数据库+ibatis darkranger sql mysql ibatis SQL Server
--插入数据方面如果您需要数据库自增... 那么在插入的时候不需要指定自增列. 如果想自己指定ID列的值, 那么要设置 set identity_insert 数据库名.模式名.表名; ----然后插入数据; example: create table zhabei.test( id bigint identity(1,1) primary key, nam
XML 解析四种方式 aijuans android
XML现在已经成为一种通用的数据交换格式,平台的无关性使得很多场合都需要用到XML。本文将详细介绍用Java解析XML的四种方法。 XML现在已经成为一种通用的数据交换格式,它的平台无关性,语言无关性,系统无关性,给数据集成与交互带来了极大的方便。对于XML本身的语法知识与技术细节,需要阅读相关的技术文献,这里面包括的内容有DOM(Document Object
spring中配置文件占位符的使用 avords
1.类 <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.o
前端工程化-公共模块的依赖和常用的工作流 bee1314 webpack
题记：一个人的项目，还有工程化的问题嘛？我们在推进模块化和组件化的过程中，肯定会不断的沉淀出我们项目的模块和组件。对于这些沉淀出的模块和组件怎么管理？另外怎么依赖也是个问题？你真的想这样嘛？ var BreadCrumb = require(‘../../../../uikit/breadcrumb’); //真心ugly。
上司说「看你每天准时下班就知道你工作量不饱和」，该如何回应？ bijian1013 项目管理沟通 IT职业规划
问题：上司说「看你每天准时下班就知道你工作量不饱和」，如何回应正常下班时间6点，只要是6点半前下班的，上司都认为没有加班。 Eno-Bea回答，注重感受，不一定是别人的虽然我不知道你具体从事什么工作与职业，但是我大概猜测，你是从事一项不太容易出现阶段性成果的工作
TortoiseSVN，过滤文件征客丶 SVN
环境： TortoiseSVN 1.8 配置：在文件夹空白处右键选择 TortoiseSVN -> Settings 在 Global ignote pattern 中添加要过滤的文件：多类型用英文空格分开 *name ：过滤所有名称为 name 的文件或文件夹 *.name ：过滤所有后缀为 name 的文件或文件夹 --------
【Flume二】HDFS sink细说 bit1129 Flume
1. Flume配置 a1.sources=r1 a1.channels=c1 a1.sinks=k1 ###Flume负责启动44444端口 a1.sources.r1.type=avro a1.sources.r1.bind=0.0.0.0 a1.sources.r1.port=44444 a1.sources.r1.chan
The Eight Myths of Erlang Performance bookjovi erlang
erlang有一篇guide很有意思： http://www.erlang.org/doc/efficiency_guide 里面有个The Eight Myths of Erlang Performance： http://www.erlang.org/doc/efficiency_guide/myths.html Myth: Funs are sl
java多线程网络传输文件(非同步)-2008-08-17 ljy325 java 多线程 socket
利用 Socket 套接字进行面向连接通信的编程。客户端读取本地文件并发送；服务器接收文件并保存到本地文件系统中。使用说明:请将TransferClient, TransferServer, TempFile三个类编译，他们的类包是FileServer. 客户端: 修改TransferClient: serPort, serIP, filePath, blockNum,的值来符合您机器的系
读《研磨设计模式》-代码笔记-模板方法模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet;
配置心得 chenyu19891124 配置
时间就这样不知不觉的走过了一个春夏秋冬，转眼间来公司已经一年了，感觉时间过的很快，时间老人总是这样不停走，从来没停歇过。作为一名新手的配置管理员，刚开始真的是对配置管理是一点不懂，就只听说咱们公司配置主要是负责升级，而具体该怎么做却一点都不了解。经过老员工的一点点讲解，慢慢的对配置有了初步了解，对自己所在的岗位也慢慢的了解。做了一年的配置管理给自总结下： 1.改变从一个以前对配置毫无
对“带条件选择的并行汇聚路由问题”的再思考 comsci 算法工作软件测试嵌入式领域模型
2008年上半年，我在设计并开发基于”JWFD流程系统“的商业化改进型引擎的时候，由于采用了新的嵌入式公式模块而导致出现“带条件选择的并行汇聚路由问题”(请参考2009-02-27博文)，当时对这个问题的解决办法是采用基于拓扑结构的处理思想，对汇聚点的实际前驱分支节点通过算法预测出来，然后进行处理，简单的说就是找到造成这个汇聚模型的分支起点，对这个起始分支节点实际走的路径数进行计算，然后把这个实际
Oracle 10g 的clusterware 32位下载地址 daizj oracle
Oracle 10g 的clusterware 32位下载地址 http://pan.baidu.com/share/link?shareid=531580&uk=421021908 http://pan.baidu.com/share/link?shareid=137223&uk=321552738 http://pan.baidu.com/share/l
非常好的介绍：Linux定时执行工具cron dongwei_6688 linux
Linux经过十多年的发展，很多用户都很了解Linux了，这里介绍一下Linux下cron的理解，和大家讨论讨论。cron是一个Linux 定时执行工具，可以在无需人工干预的情况下运行作业，本文档不讲cron实现原理，主要讲一下Linux定时执行工具cron的具体使用及简单介绍。新增调度任务推荐使用crontab -e命令添加自定义的任务（编辑的是/var/spool/cron下对应用户的cr
Yii assets目录生成及修改 dcj3sjt126com yii
assets的作用是方便模块化，插件化的，一般来说出于安全原因不允许通过url访问protected下面的文件，但是我们又希望将module单独出来，所以需要使用发布，即将一个目录下的文件复制一份到assets下面方便通过url访问。 assets设置对应的方法位置 \framework\web\CAssetManager.php assets配置方法在m
mac工作软件推荐 dcj3sjt126com mac
mac上的Terminal + bash ＋ screen组合现在已经非常好用了，但是还是经不起iterm＋zsh＋tmux的冲击。在同事的强烈推荐下，趁着升级mac系统的机会，顺便也切换到iterm＋zsh＋tmux的环境下了。我为什么要要iterm2 切换过来也是脑袋一热的冲动，我也调查过一些资料，看了下iterm的一些优点： * 兼容性好，远程服务器 vi 什么的低版本能很好兼
Memcached(三)、封装Memcached和Ehcache frank1234 memcached ehcache spring ioc
本文对Ehcache和Memcached进行了简单的封装，这样对于客户端程序无需了解ehcache和memcached的差异，仅需要配置缓存的Provider类就可以在二者之间进行切换，Provider实现类通过Spring IoC注入。 cache.xml <?xml version="1.0" encoding="UTF-8"?>
Remove Duplicates from Sorted List II hcx2013 remove
Given a sorted linked list, delete all nodes that have duplicate numbers, leaving only distinct numbers from the original list. For example,Given 1->2->3->3->4->4->5,
Spring4新特性——注解、脚本、任务、MVC等其他特性改进 jinnianshilongnian spring4
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
MySQL安装文档 liyong0802 mysql
工作中用到的MySQL可能安装在两种操作系统中，即Windows系统和Linux系统。以Linux系统中情况居多。安装在Windows系统时与其它Windows应用程序相同按照安装向导一直下一步就即，这里就不具体介绍，本文档只介绍Linux系统下MySQL的安装步骤。 Linux系统下安装MySQL分为三种：RPM包安装、二进制包安装和源码包安装。二
使用VS2010构建HotSpot工程 p2p2500 HotSpot OpenJDK VS2010
1. 下载OpenJDK7的源码： http://download.java.net/openjdk/jdk7 http://download.java.net/openjdk/ 2. 环境配置 ▶
Oracle实用功能之分组后列合并 seandeng888 oracle 分组实用功能合并
1 实例解析由于业务需求需要对表中的数据进行分组后进行合并的处理，鉴于Oracle10g没有现成的函数实现该功能，且该功能如若用JAVA代码实现会比较复杂，因此，特将SQL语言的实现方式分享出来，希望对大家有所帮助。如下：表test 数据如下： ID,SUBJECTCODE,DIMCODE,VALUE 1&nbs
Java定时任务注解方式实现 tuoni java spring jvm xml jni
Spring 注解的定时任务，有如下两种方式：第一种： <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http
11大Java开源中文分词器的使用方法和分词效果对比 yangshangchuan word分词器 ansj分词器 Stanford分词器 FudanNLP分词器 HanLP分词器
本文的目标有两个： 1、学会使用11大Java开源中文分词器 2、对比分析11大Java开源中文分词器的分词效果本文给出了11大Java开源中文分词的使用方法以及分词结果对比代码，至于效果哪个好，那要用的人结合自己的应用场景自己来判断。 11大Java开源中文分词器，不同的分词器有不同的用法，定义的接口也不一样，我们先定义一个统一的接口： /** * 获取文本的所有分词结果, 对比