Synioe

Mask-rcnn做检测应用

这几天在接触mask-rcnn的原理和检测数据，包括数据集的制作和标注，在一块1080ti上训练模型，到目前位置还有问题没有解决，在matterport/mask-rcnn的github仓库的issues下找解决方案，没有找到能够解决我的问题的回答。

我的问题的在balloon案例上做修改，对标记多个class的数据进行识别和实例分割，balloon是对一个class做检测，我也尝试了只对我的一个类别进行训练，是能够训练，并得到检测结果的，但是一般的应用都是需要进行多个类别的检测和识别，街上的行人，汽车，障碍物，交通灯等等都是无人驾驶的检测对象，如果我们尝试的视觉解决方案采用mask-rcnn做实验，就必要要进行多个类别的检测，所以必须探究多个class的实例分割，下面将相关的代码放上来，作为参考使用。

训练代码


#Mask R-CNN


import os
import sys
import json
import datetime
import numpy as np
import skimage.draw
from skimage import *
# Root directory of the project
ROOT_DIR = os.path.abspath("../../")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn.config import Config
from mrcnn import model as modellib, utils

# Path to trained weights file
COCO_WEIGHTS_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")

# Directory to save logs and model checkpoints, if not provided
# through the command line argument --logs
DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")

############################################################
#  Configurations
############################################################


class DefectConfig(Config):
    """Configuration for training on the toy  dataset.
    Derives from the base Config class and overrides some values.
    """
    # Give the configuration a recognizable name
    NAME = "defect"

    # We use a GPU with 12GB memory, which can fit two images.
    # Adjust down if you use a smaller GPU.
    IMAGES_PER_GPU = 1

    # Number of classes (including background)
    NUM_CLASSES = 1 + 6  # Background + defect

    # Number of training steps per epoch
    STEPS_PER_EPOCH = 100

    # Skip detections with < 90% confidence
    DETECTION_MIN_CONFIDENCE = 0.9


############################################################
#  Dataset
############################################################

class DefectDataset(utils.Dataset):

    def load_defect(self, dataset_dir, subset):
        """Load a subset of the Balloon dataset.
        dataset_dir: Root directory of the dataset.
        subset: Subset to load: train or val
        """
        # Add classes. We have 6 classes to add.
        self.add_class("x", 1, "a")
        self.add_class("x", 2, "b")
        self.add_class("x", 3, "c")
        self.add_class("x", 4, "d")
        self.add_class("x", 5, "e")
        self.add_class("x", 6, "f")

        # Train or validation dataset
        assert subset in ["train", "val"]
        dataset_dir = os.path.join(dataset_dir, subset)

        # Load annotations
        # We mostly care about the x and y coordinates of each region
        # Note: In VIA 2.0, regions was changed from a dict to a list.
        annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
        annotations = list(annotations.values())  # don't need the dict keys

        # The VIA tool saves images in the JSON even if they don't have any
        # annotations. Skip unannotated images.
        annotations = [a for a in annotations if a['regions']]
        # print("ANNOTATIONS INFO:", annotations)
        idlist = []
        # Add images
        for a in annotations:
            # Get the x, y coordinaets of points of the polygons that make up
            # the outline of each object instance. These are stores in the
            # shape_attributes (see json format above)
            # The if condition is needed to support VIA versions 1.x and 2.x.
            # print(type(a['regions'])) #list
            '''if type(a['regions']) is dict:
                polygons = [r['shape_attributes'] for r in a['regions'].values()]
            else:
                polygons = [r['shape_attributes'] for r in a['regions']]'''
            #print(a['regions'])
            polygons = [r['shape_attributes'] for r in a['regions']]
            #name = [r['region_attributes']['type'] for r in a['regions']]
            #print("[NAME INFO:]", name)
            #print(type(name)) #list
            '''
            [NAME INFO:] [{'porosity': True}, {'porosity': True}, {'surface hollow': True}, 
            {'porosity': True}, {'lacking of sintering': True}, {'porosity': True}, 
            {'porosity': True}, {'porosity': True}, {'surface hollow': True}, 
            {'surface hollow': True}, {'surface hollow': True}, {'surface hollow': True}, 
            {'surface hollow': True}, {'surface hollow': True}, {'surface hollow': True}, 
            {'surface hollow': True}, {'surface hollow': True}]
            
            '''
            '''name_dict = {"porosity": 1,
                     "crack": 2,
                         "porosity array": 3,
                         "lacking of sintering": 4,
                         "surface hollow": 5,
                         "surface scratch": 6}
            name_id = [name_dict[a] for a in name]'''
            class_names = [r['region_attributes'] for r in a['regions']]
            for i in range(len(class_names)):
                if class_names[i]["type"] == "porosity":
                    idlist.append(1)
                elif class_names[i]["type"] == "crack":
                    idlist.append(2)
                elif class_names[i]["type"] == "porosity":
                    idlist.append(3)
                elif class_names[i]["type"] == "lacking of sintering":
                    idlist.append(4)
                elif class_names[i]["type"] == "surface hollow":
                    idlist.append(5)
                elif class_names[i]["type"] == "surface scratch":
                    idlist.append(6)


            #i = len(name)
            #for i in range(len(name)):
            #    name_id = [name_dict[a] for a in name[i].keys()]

            from skimage.io import imread
            # load_mask() needs the image size to convert polygons to masks.
            # Unfortunately, VIA doesn't include it in JSON, so we must read
            # the image. This is only managable since the dataset is tiny.
            image_path = os.path.join(dataset_dir, a['filename'])
            image = skimage.io.imread(image_path)
            height, width = image.shape[:2]

            self.add_image(
                "defect",
                image_id=a['filename'],  # use file name as a unique image id
                path=image_path,
                class_id=np.array(idlist),
                width=width, height=height,
                polygons=polygons)

    def load_mask(self, image_id):
        """Generate instance masks for an image.
       Returns:
        masks: A bool array of shape [height, width, instance count] with
            one mask per instance.
        class_ids: a 1D array of class IDs of the instance masks.
        """
        # If not a balloon dataset image, delegate to parent class.
        image_info = self.image_info[image_id]
        if image_info["source"] != "defect":
            return super(self.__class__, self).load_mask(image_id)

        name_id = image_info["class_id"]
        print("[name id]", name_id)

        # Convert polygons to a bitmap mask of shape
        # [height, width, instance_count]
        info = self.image_info[image_id]
        mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
                        dtype=np.uint8)
        class_ids = np.array(name_id, dtype=np.int32)

        for i, p in enumerate(info["polygons"]):
            # Get indexes of pixels inside the polygon and set them to 1
            rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
            mask[rr, cc, i] = 1

        # Return mask, and array of class IDs of each instance. Since we have
        # one class ID only, we return an array of 1s
        # return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32)
        return (mask.astype(np.bool), class_ids)

    def image_reference(self, image_id):
        """Return the path of the image."""
        info = self.image_info[image_id]
        if info["source"] == "defect":
            return info["path"]
        else:
            super(self.__class__, self).image_reference(image_id)


def train(model):
    """Train the model."""
    # Training dataset.
    dataset_train = DefectDataset()
    dataset_train.load_defect(args.dataset, "train")
    dataset_train.prepare()

    # Validation dataset
    dataset_val = DefectDataset()
    dataset_val.load_defect(args.dataset, "val")
    dataset_val.prepare()

    # *** This training schedule is an example. Update to your needs ***
    # Since we're using a very small dataset, and starting from
    # COCO trained weights, we don't need to train too long. Also,
    # no need to train all layers, just the heads should do it.
    print("Training network heads")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=30,
                layers='heads')


def color_splash(image, mask):
    """Apply color splash effect.
    image: RGB image [height, width, 3]
    mask: instance segmentation mask [height, width, instance count]

    Returns result image.
    """
    # Make a grayscale copy of the image. The grayscale copy still
    # has 3 RGB channels, though.
    gray = skimage.color.gray2rgb(skimage.color.rgb2gray(image)) * 255
    # Copy color pixels from the original color image where mask is set
    if mask.shape[-1] > 0:
        # We're treating all instances as one, so collapse the mask into one layer
        mask = (np.sum(mask, -1, keepdims=True) >= 1)
        splash = np.where(mask, image, gray).astype(np.uint8)
    else:
        splash = gray.astype(np.uint8)
    return splash


def detect_and_color_splash(model, image_path=None, video_path=None):
    assert image_path or video_path

    # Image or video?
    if image_path:
        # Run model detection and generate the color splash effect
        print("Running on {}".format(args.image))
        # Read image
        image = skimage.io.imread(args.image)
        # Detect objects
        r = model.detect([image], verbose=1)[0]
        # Color splash
        splash = color_splash(image, r['masks'])
        # Save output
        file_name = "splash_{:%Y%m%dT%H%M%S}.png".format(datetime.datetime.now())
        skimage.io.imsave(file_name, splash)
    elif video_path:
        import cv2
        # Video capture
        vcapture = cv2.VideoCapture(video_path)
        width = int(vcapture.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(vcapture.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = vcapture.get(cv2.CAP_PROP_FPS)

        # Define codec and create video writer
        file_name = "splash_{:%Y%m%dT%H%M%S}.avi".format(datetime.datetime.now())
        vwriter = cv2.VideoWriter(file_name,
                                  cv2.VideoWriter_fourcc(*'MJPG'),
                                  fps, (width, height))

        count = 0
        success = True
        while success:
            print("frame: ", count)
            # Read next image
            success, image = vcapture.read()
            if success:
                # OpenCV returns images as BGR, convert to RGB
                image = image[..., ::-1]
                # Detect objects
                r = model.detect([image], verbose=0)[0]
                # Color splash
                splash = color_splash(image, r['masks'])
                # RGB -> BGR to save image to video
                splash = splash[..., ::-1]
                # Add image to video writer
                vwriter.write(splash)
                count += 1
        vwriter.release()
    print("Saved to ", file_name)


############################################################
#  Training
############################################################

if __name__ == '__main__':
    import argparse

    # Parse command line arguments
    parser = argparse.ArgumentParser(
        description='Train Mask R-CNN to detect balloons.')
    parser.add_argument("command",
                        metavar="",
                        help="'train' or 'splash'")
    parser.add_argument('--dataset', required=False,
                        metavar="/path/to/balloon/dataset/",
                        help='Directory of the Balloon dataset')
    parser.add_argument('--weights', required=True,
                        metavar="/path/to/weights.h5",
                        help="Path to weights .h5 file or 'coco'")
    parser.add_argument('--logs', required=False,
                        default=DEFAULT_LOGS_DIR,
                        metavar="/path/to/logs/",
                        help='Logs and checkpoints directory (default=logs/)')
    parser.add_argument('--image', required=False,
                        metavar="path or URL to image",
                        help='Image to apply the color splash effect on')
    parser.add_argument('--video', required=False,
                        metavar="path or URL to video",
                        help='Video to apply the color splash effect on')
    args = parser.parse_args()

    # Validate arguments
    if args.command == "train":
        assert args.dataset, "Argument --dataset is required for training"
    elif args.command == "splash":
        assert args.image or args.video,\
               "Provide --image or --video to apply color splash"

    print("Weights: ", args.weights)
    print("Dataset: ", args.dataset)
    print("Logs: ", args.logs)

    # Configurations
    if args.command == "train":
        config = DefectConfig()
    else:
        class InferenceConfig(DefectConfig):
            # Set batch size to 1 since we'll be running inference on
            # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
            GPU_COUNT = 1
            IMAGES_PER_GPU = 1
        config = InferenceConfig()
    config.display()

    # Create model
    if args.command == "train":
        model = modellib.MaskRCNN(mode="training", config=config,
                                  model_dir=args.logs)
    else:
        model = modellib.MaskRCNN(mode="inference", config=config,
                                  model_dir=args.logs)

    # Select weights file to load
    if args.weights.lower() == "coco":
        weights_path = COCO_WEIGHTS_PATH
        # Download weights file
        if not os.path.exists(weights_path):
            utils.download_trained_weights(weights_path)
    elif args.weights.lower() == "last":
        # Find last trained weights
        weights_path = model.find_last()
    elif args.weights.lower() == "imagenet":
        # Start from ImageNet trained weights
        weights_path = model.get_imagenet_weights()
    else:
        weights_path = args.weights

    # Load weights
    print("Loading weights ", weights_path)
    if args.weights.lower() == "coco":
        # Exclude the last layers because they require a matching
        # number of classes
        model.load_weights(weights_path, by_name=True, exclude=[
            "mrcnn_class_logits", "mrcnn_bbox_fc",
            "mrcnn_bbox", "mrcnn_mask"])
    else:
        model.load_weights(weights_path, by_name=True)

    # Train or evaluate
    if args.command == "train":
        train(model)
    elif args.command == "splash":
        detect_and_color_splash(model, image_path=args.image,
                                video_path=args.video)
    else:
        print("'{}' is not recognized. "
              "Use 'train' or 'splash'".format(args.command))

参考方案1

def load_student(self, dataset_dir, subset):
    self.add_class("student", 1, "student")
    self.add_class("student", 2, "bag")

    # Train or validation dataset?
    assert subset in ["train", "val"]
    dataset_dir = os.path.join(dataset_dir, subset)


    annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
    annotations = list(annotations.values())  # don't need the dict keys


    annotations = [a for a in annotations if a['regions']]

        # Add images
    for a in annotations:
        polygons = [r['shape_attributes'] for r in a['regions'].values()]
        objects = [s['region_attributes'] for s in a['regions'].values()]
        
        print(objects)
        num_ids=[]
        for n in objects:
        	#print(n)
        	#print(type(n))
            try:
            	if n['object_name']=='student':
            		num_ids.append(1)
            	elif n['object_name']=='bag':
            		num_ids.append(2)
            except:
                pass
        	
        #num_ids = [int(n['object_name']) for n in objects]
        image_path = os.path.join(dataset_dir, a['filename'])
        image = skimage.io.imread(image_path)
        height, width = image.shape[:2]

        self.add_image(
            "student",
            image_id=a['filename'],  # use file name as a unique image id
            path=image_path,
            width=width, height=height,
            polygons=polygons,
            num_ids=num_ids)

参考方案2

def load_multi_number(self, dataset_dir, subset):
"""Load a subset of the number dataset.
dataset_dir: Root directory of the dataset.
subset: Subset to load: train or val
"""
# Add classes
self.add_class("object", 1, "A")
self.add_class("object", 2, "B")
self.add_class("object", 3, "C")
self.add_class("object", 4, "D")
self.add_class("object", 5, "E")
self.add_class("object", 6, "F")
self.add_class("object", 7, "G")
self.add_class("object", 8, "H")
self.add_class("object", 9, "I")
self.add_class("object", 10, "J")
self.add_class("object", 11, "K")
self.add_class("object", 12, "browl")

    # Train or validation dataset?
    assert subset in ["train", "val"]
    dataset_dir = os.path.join(dataset_dir, subset)

    annotations = json.load(open(os.path.join(dataset_dir, ".../train/via_region_data.json")))
    annotations = list(annotations.values())  # don't need the dict keys

    # The VIA tool saves images in the JSON even if they don't have any
    # annotations. Skip unannotated images.
    annotations = [a for a in annotations if a['regions']]

    # Add images
    for a in annotations:
        # Get the x, y coordinaets of points of the polygons that make up
        # the outline of each object instance. There are stores in the
        # shape_attributes (see json format above)
        # for b in a['regions'].values():
        #    polygons = [{**b['shape_attributes'], **b['region_attributes']}]
        # print("string=", polygons)
        # for r in a['regions'].values():
        #    polygons = [r['shape_attributes']]
        #    # print("polygons=", polygons)
        #    multi_numbers = [r['region_attributes']]
            # print("multi_numbers=", multi_numbers)
        polygons = [r['shape_attributes'] for r in a['regions'].values()]
        objects = [s['region_attributes'] for s in a['regions'].values()]
        # print("multi_numbers=", multi_numbers)
        # num_ids = [n for n in multi_numbers['number'].values()]
        # for n in multi_numbers:
        num_ids = [int(n['object']) for n in objects]
        # print("num_ids=", num_ids)
        # print("num_ids_new=", num_ids_new)
        # categories = [s['region_attributes'] for s in a['regions'].values()]
        # load_mask() needs the image size to convert polygons to masks.
        # Unfortunately, VIA doesn't include it in JSON, so we must read
        # the image. This is only managable since the dataset is tiny.
        image_path = os.path.join(dataset_dir, a['filename'])
        image = skimage.io.imread(image_path)
        height, width = image.shape[:2]

        self.add_image(
            "object",
            image_id=a['filename'],  # use file name as a unique image id
            path=image_path,
            width=width, height=height,
            polygons=polygons,
            num_ids=num_ids)


def load_mask(self, image_id):
    """Generate instance masks for an image.
   Returns:
    masks: A bool array of shape [height, width, instance count] with
        one mask per instance.
    class_ids: a 1D array of class IDs of the instance masks.
    """
    # If not a number dataset image, delegate to parent class.
    info = self.image_info[image_id]
    if info["source"] != "object":
        return super(self.__class__, self).load_mask(image_id)
    num_ids = info['num_ids']
    # Convert polygons to a bitmap mask of shape
    # [height, width, instance_count]
    mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
                    dtype=np.uint8)

    for i, p in enumerate(info["polygons"]):
        # Get indexes of pixels inside the polygon and set them to 1
        rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
        mask[rr, cc, i] = 1
    # print("info['num_ids']=", info['num_ids'])
    # Map class names to class IDs.
    num_ids = np.array(num_ids, dtype=np.int32)
    return mask, num_ids

def image_reference(self, image_id):
    """Return the path of the image."""
    info = self.image_info[image_id]
    if info["source"] == "object":
        return info["path"]
    else:
        super(self.__class__, self).image_reference(image_id)

安卓编译报错expo-modules-core:prepareBoost Not in GZIP format的解决方案
作者:Kovli重要通知：红宝书第5版2024年12月1日出炉了，感兴趣的可以去看看，https://u.jd.com/saQw1vP红宝书第五版中文版红宝书第五版英文原版pdf下载(访问密码:9696)报错如下[RUN_GRADLEW]Executionfailedfortask':expo-modules-core:prepareBoost'.[RUN_GRADLEW]>Couldnotrea
单例模式的几种实现方式 dlwlrma-IU LeetCode刷题企业面试真题 java 开发语言
单例模式单例模式是一种常见的设计模式，而关于单例模式的实现又有以下几种实现方式：饿汉单例，懒汉单例，双重校验锁，静态内部类等实现饿汉单例该懒汉单例是线程安全的，但是存在资源浪费的情况，在程序启动时就会创建该类的实例。/***@author:dlwlrma*@data2025年01月15日16:34*@Description单例模式之懒汉单例*/publicclassSingleton{//私有静态
Jenkins持续集成入门到精通西湖河畔砍柴人 java架构 jenkins 持续集成系统 java
这里写目录标题持续集成及Jenkins介绍软件开发的生命周期软件开发的瀑布模型软件开发的敏捷开发模型什么是敏捷开发什么是持续集成持续集成的组成要素Jenkins介绍持续集成及Jenkins介绍软件开发的生命周期软件开发生命周期简称SDLC(SoftwareDevelopmentLifeCycle)，它是集合了计划、开发、测试和部署的集合。需求分析－》项目架构设计－》编码－》测试－》部署维护软件开发
element ui字段_ui备忘单下拉字段 weixin_26715991 java sql leetcode
elementui字段重点(Tophighlight)DropdownsgetalotofflakfromtheUIworld–andifwearehonest,it’snotwithoutreason.Donebadly,theybecomecumbersome,overwhelming,andugly.Butthat’snotwhatthischeatsheetisabout.Herewewi
Android gradle 依赖树变化监控实现（gradle dependency tree change）
前言这篇文章，其实在一年之前的时候就已经写好了。当时是在公司内部分享的，作为一个监控框架。当时是想着过一段时间之后，分享到技术论坛上面的，没想到计划赶不上变化，过完国庆被裁了。当时忙着找工作，就一直没有更新了，放在笔记里面吃灰。最近，发现好久没有分享技术文章了，从笔记里面找了一下，就拿来分享了。在项目开发中，会有很多第三方依赖，通过gradle引入进来的。比如androidxDesignVersi
Android RecyclerView 实现瀑布流 android
AndroidRecyclerView使用大全-基础使用，item动画，下拉刷新等瀑布流也是个常用的显示控件了，但是在使用时经常遇到一些问题，比如滑动回顶部后出现空隙、item在滑动时乱跳等问题。下面就来说说我怎么实现的瀑布流，并且怎么处理上面所说的这些问题的。我使用了原生控件RecyclerView+StaggeredGridLayoutManager来实现的瀑布流，没有用第三方开源框架。下面以
vue3 uiapp实现一个数字输入组件，输入非数字会默认转成最小数初遇你时动了情 vue3 uniapp javascript 前端 vue.js
实现一个数字输入组件用户输入字符串、汉字、字母等非数字，会默认转成最小数使用vue3最新语法defineModel-=max}"@click="handlePlus">+import{ref,toRefs,onMounted}from'vue'constmodel=defineModel()constprops=defineProps({min:{type:Number,default:0},ma
leetcode131.分割回文串努力d小白 #回溯 java javascript 开发语言
给你一个字符串s，请你将s分割成一些子串，使每个子串都是回文串。返回s所有可能的分割方案。示例1：输入：s="aab"输出：[["a","a","b"],["aa","b"]]示例2：输入：s="a"输出：[["a"]]思路：主要就是确定一个串的start和endList>list=newArrayListret=newArrayList>partition(Strings){backTracki
iOS - TLS（线程本地存储） Batac_蝠猫 iOS底层原理 ios objective-c 开发语言
从源码中，详细总结TLS(ThreadLocalStorage)的实现：1.TLS基本结构//TLS的基本结构structtls_data{pthread_key_tkey;//线程本地存储的键void(*destructor)(void*);//清理函数};//自动释放池的TLSclassAutoreleasePoolPage{staticpthread_key_tconstkey=AUTORE
彻底摆脱困扰：掌握解决系统提示丢失MSVCR120.dll疑难杂症的终极指南极光—50987 经验分享
当我们的电脑系统突然弹出提示，告知我们计算机中丢失了MSVCR120.dll文件时，这无疑会给我们的日常使用带来困扰。特别是一些依赖此文件的程序和游戏，可能会因此无法正常启动或运行。面对这一问题，许多用户会选择自行下载缺失的msvcr120.dll文件，然而，仅仅下载文件并不足够，正确的放置位置同样至关重要。接下来，本文将详细介绍如何将msvcr120.dll文件放置到正确的位置，以解决系统提示丢
react Hooks 父组件调用子组件函数、获取子组件属性初遇你时动了情 react.js javascript 前端
子组件import{forwardRef,useImperativeHandle}from'react'//定义子组件的ref类型exportinterfaceChildRef{childMethod:()=>voidchildValue:string}constChild=forwardRef((props,ref)=>{//暴露给父组件的方法和属性useImperativeHandle(ref
ASP.NET Core - 依赖注入(四) 啊晚 ASP.NET CORE 系列总结 asp.net 后端
ASP.NETCore-依赖注入（四）4.ASP.NETCore默认服务5.依赖注入配置变形4.ASP.NETCore默认服务之前讲了中间件，实际上一个中间件要正常进行工作，通常需要许多的服务配合进行，而中间件中的服务自然也是通过Ioc容器进行注册和注入的。前面也讲到，按照约定中间件的封装一般会提供一个User{Middleware}的扩展方法给用户使用，而服务注册中也有一个类似的约定，一般会有一
VLM 系列——Qwen2 VL——论文解读——前瞻（源码解读） TigerZ* AIGC算法 AIGC 人工智能 transformer 计算机视觉图像处理
一、概述1、是什么是一系列多模态大型语言模型（MLLM），其中包括2B、7B、72B三个版本，整体采用视觉编码器+LLM形式（可以认为没有任何投射层）。比较创新的是图像缩放方式+3DLLM位置编码+（预估后面的训练方式也不太一样）。能够处理包括文本、图像在内的多种数据类型，具备图片描述、单图文问答、多图问对话、视频理解对话、json格式、多语言、agent、高清图理解（代码编写和debug论文暂时
记录一次RPC服务有损上线的分析过程程序员
作者：京东零售郭宏宇1.问题背景某应用在启动完提供JSF服务后，短时间内出现了大量的空指针异常。分析日志，发现是服务依赖的藏经阁配置数据未加载完成导致。即所谓的有损上线或者是直接发布，当****应用启动时，service还没加载完，就开始对外提供服务，导致失败调用。关键代码如下数据的初始化加载是通过实现CommandLineRunner接口完成的@ComponentpublicclassLoadS
YOLOv11改进策略【Neck】| TPAMI 2024 FreqFusion 频域感知特征融合模块解决密集图像预测问题 Limiiiing YOLOv11改进专栏 YOLO 深度学习计算机视觉目标检测
一、本文介绍本文主要利用FreqFusion结构改进YOLOv11的目标检测网络模型。FreqFusion结构针对传统特征融合在密集图像预测中存在的问题，创新性地引入自适应低通滤波器生成器、偏移量生成器和自适应高通滤波器生成器。将FreqFusion应用于YOLOv11的改进过程中，能够使模型在处理复杂场景图像时，更精准地聚焦目标物体边界，减少背景噪声干扰，显著强化目标物体边界特征表达，进而提升模
《CPython Internals》阅读笔记：p97-p117 python
《CPythonInternals》学习第7天，p97-p117总结，总计21页。一、技术总结1.词法分析(lexicalanalysis)根据《Compilers-Principles,Techniques,andTools》(《编译原理》第2版)第5页：Thefirstphaseofacompileriscalledlexicalanalysisorscanning.Thelexcicalan
Go 并发控制：sync.WaitGroup 详解后端go并发编程并发面试
首发地址：https://mp.weixin.qq.com/s/-FtDLcHW39vgvqSMUVM-yw前段时间我在《Go并发控制：errgroup详解》一文中讲解了errgroup的用法和源码，通过源码我们知道errgroup内部是使用sync.WaitGroup实现的，那么本文就更进一步，来探索下sync.WaitGroup源码是如何实现的。使用示例sync.WaitGroup可以用来阻塞
探索Qwen-VL：一个全栈式的视觉语言模型开发框架钟洁祺
探索Qwen-VL：一个全栈式的视觉语言模型开发框架Qwen-VLTheofficialrepoofQwen-VL(通义千问-VL)chat&pretrainedlargevisionlanguagemodelproposedbyAlibabaCloud.项目地址:https://gitcode.com/gh_mirrors/qw/Qwen-VL项目简介是一款由QwenLM开发的全栈式视觉语言（V
【C语言】exit函数详解 DevKevin #C 函数算法
一、exit函数的定义exit函数是C标准库中的函数，其原型定义在stdlib.h头文件中。exit函数的作用是终止当前程序的执行，并返回一个指定的退出码给操作系统。其基本用法如下：#includevoidexit(intstatus);具体功能分类有以下三种，不同的类型对应不同的使用环境：status参数是程序的退出状态码，通常情况下，0表示程序正常结束，非零值表示程序出现了错误或异常情况。调用
ArgoWorkflow教程(八)---基于 LifecycleHook 实现流水线通知提醒
本篇介绍一下ArgoWorkflow中的ExitHandler和LifecycleHook功能，可以根据流水线每一步的不同状态，执行不同操作，一般用于发送通知。1.概述本篇介绍一下ArgoWorkflow中的ExitHandler和LifecycleHook功能，可以根据流水线每一步的不同状态，执行不同操作，一般用于发送通知。比如当某个步骤，或者某个Workflow执行失败时，发送邮件通知。在Ar
H5跳转到 React Native App
H5在浏览器跳转到App配置URLSchemesIOSInfo.plistCFBundleURLTypesCFBundleTypeRoleEditorCFBundleURLSchemesxxxxAndroidAndroidManifest.xmlh5跳转通过window.location.href=xxxxx的方式跳转constopenApp=(isIOS,extinfo?:string)=>{i
iOS - 底层实现中涉及的类型 Batac_蝠猫 iOS底层原理 ios
1.基本类型定义//基础类型typedefunsignedlonguintptr_t;//指针大小的无符号整数typedeflongptrdiff_t;//指针差值类型typedefunsignedintuint32_t;//32位无符号整数typedefunsignedlonglonguint64_t;//64位无符号整数//掩码类型typedefuintptr_tmask_t;//用于位掩码操
axum--代码案例 rustweb
最简单的demo#[tokio::main]asyncfnmain(){//buildourapplicationwitharouteletapp=Router::new().route("/",get(handler));//runitletlistener=tokio::net::TcpListener::bind("127.0.0.1:3000").await.unwrap();printl
Mysql入门基础必备知识平常心cyk 数据库 mysql
MySQL中的SQL语句是关系型数据库管理系统操作的核心，涵盖了数据定义、数据操作、数据查询和数据控制等多个方面。以下是对MySQL中SQL语句所有重要知识的归纳：一、SQL语句分类数据定义语言（DDL）：用于定义数据库对象，如表、索引、视图等。创建数据库和表：使用CREATEDATABASE创建数据库，使用CREATETABLE创建表。修改数据库和表：使用ALTERDATABASE修改数据库属性
offer多多PDD25届实习生-前/后端研发、算法 2301_78234743 java
题解|#KiKi求质数个数##include/*素数:只能被1和它本身整除的数例如:题解|#牛牛的字符矩形##includeintmain(){charn='#'题解|#空心正方形图案##includeintmain(){inta,i,j;题解|#X形图案##includeintmain(){inta;in题解|#最长回文子串#constrl=require("readline").createI
狂飙 50 倍丨TiDB DDL 框架优化深度解析 tidbddl数据库分布式
导读在多租户大规模部署场景下，传统单机数据库的管理复杂性问题仍困扰着用户。在TiDBv6-v7版本中，我们成功将TiDBDDL创建索引的性能提升了10倍，为用户带来了显著的体验改善。在TiDBv8版本中，我们对TiDBDDL语句执行流程进行了进一步的优化和重构，显著提升了框架的可扩展性和语句的执行效率，为未来实现TiDBDDL的真正分布式执行奠定了坚实基础。本系列文章将从原理解析、技术实现和应用实
机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
C语言宏函数南林yan C语言 c语言
一、什么是宏函数？通过宏定义的函数是宏函数。如下，编译器在预处理阶段会将Add(x,y)替换为((x)*(y))#defineAdd(x,y)((x)*(y))#defineAdd(x,y)((x)*(y))intmain(){inta=10;intb=20;intd=10;intc=Add(a+d,b)*2;cout<
Linux下QT开发的动态库界面弹出操作（SDL2） 13jjyao QT类 qt 开发语言 sdl2 linux
需求：操作系统为linux，开发框架为qt，做成需带界面的qt动态库，调用方为java等非qt程序难点：调用方为java等非qt程序，也就是说调用方肯定不带QApplication::exec()，缺少了这个，QTimer等事件和QT创建的窗口将不能弹出(包括opencv也是不能弹出)；这与qt调用本身qt库是有本质的区别的思路：1.调用方缺QApplication::exec()，那么我们在接口
linux sdl windows.h,Windows下的SDL安装奔跑吧linux内核 linux sdl windows.h
首先你要下载并安装SDL开发包。如果装在C盘下，路径为C:\SDL1.2.5如果在WINDOWS下。你可以按以下步骤：1.打开VC++，点击"Tools",Options2,点击directories选项3.选择"Includefiles"增加一个新的路径。"C:\SDL1.2.5\include"4，现在选择"Libaryfiles“增加"C:\SDL1.2.5\lib"现在你可以开始编写你的第
关于旗正规则引擎规则中的上传和下载问题何必如此文件下载压缩 jsp 文件上传
文件的上传下载都是数据流的输入输出，大致流程都是一样的。一、文件打包下载 1.文件写入压缩包 string mainPath="D:\upload\"; 下载路径 string tmpfileName=jar.zip; &n
【Spark九十九】Spark Streaming的batch interval时间内的数据流转源码分析 bit1129 Stream
以如下代码为例（SocketInputDStream）： Spark Streaming从Socket读取数据的代码是在SocketReceiver的receive方法中，撇开异常情况不谈(Receiver有重连机制，restart方法，默认情况下在Receiver挂了之后，间隔两秒钟重新建立Socket连接)，读取到的数据通过调用store(textRead)方法进行存储。数据
spark master web ui 端口8080被占用解决方法 daizj 8080 端口占用 spark master web ui
spark master web ui 默认端口为8080，当系统有其它程序也在使用该接口时，启动master时也不会报错，spark自己会改用其它端口，自动端口号加1，但为了可以控制到指定的端口，我们可以自行设置，修改方法： 1、cd SPARK_HOME/sbin 2、vi start-master.sh 3、定位到下面部分
oracle_执行计划_谓词信息和数据获取周凡杨 oracle 执行计划
oracle_执行计划_谓词信息和数据获取(上) 一：简要说明在查看执行计划的信息中，经常会看到两个谓词filter和access，它们的区别是什么，理解了这两个词对我们解读Oracle的执行计划信息会有所帮助。简单说，执行计划如果显示是access，就表示这个谓词条件的值将会影响数据的访问路径（表还是索引），而filter表示谓词条件的值并不会影响数据访问路径，只起到
spring中datasource配置 g21121 dataSource
datasource配置有很多种，我介绍的一种是采用c3p0的，它的百科地址是： http://baike.baidu.com/view/920062.htm  <bean name="propertiesConfig" class="org.springframework.b
web报表工具FineReport使用中遇到的常见报错及解决办法（三）老A不折腾 finereport FAQ 报表软件
这里写点抛砖引玉，希望大家能把自己整理的问题及解决方法晾出来，Mark一下，利人利己。出现问题先搜一下文档上有没有，再看看度娘有没有，再看看论坛有没有。有报错要看日志。下面简单罗列下常见的问题，大多文档上都有提到的。 1、repeated column width is largerthan paper width：这个看这段话应该是很好理解的。比如做的模板页面宽度只能放
mysql 用户管理墙头上一根草 linux mysql user
1.新建用户 //登录MYSQL@>mysql -u root -p@>密码//创建用户mysql> insert into mysql.user(Host,User,Password) values(‘localhost’,'jeecn’,password(‘jeecn’));//刷新系统权限表mysql>flush privileges;这样就创建了一个名为：
关于使用Spring导致c3p0数据库死锁问题 aijuans spring Spring 入门 Spring 实例 Spring3 Spring 教程
这个问题我实在是为整个 springsource 的员工蒙羞如果大家使用 spring 控制事务，使用 Open Session In View 模式， com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.
百度词库联想 annan211 百度
<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>RunJS</title&g
int数据与byte之间的相互转换实现代码百合不是茶位移 int转byte byte转int 基本数据类型的实现
在BMP文件和文件压缩时需要用到的int与byte转换,现将理解的贴出来; 主要是要理解;位移等概念 http://baihe747.iteye.com/blog/2078029 int转byte; byte转int; /** * 字节转成int,int转成字节 * @author Administrator *
简单模拟实现数据库连接池 bijian1013 java thread java多线程简单模拟实现数据库连接池
简单模拟实现数据库连接池实例1： package com.bijian.thread; public class DB { //private static final int MAX_COUNT = 10; private static final DB instance = new DB(); private int count = 0; private i
一种基于Weblogic容器的鉴权设计 bijian1013 java weblogic
服务器对请求的鉴权可以在请求头中加Authorization之类的key，将用户名、密码保存到此key对应的value中，当然对于用户名、密码这种高机密的信息，应该对其进行加砂加密等，最简单的方法如下： String vuser_id = "weblogic"; String vuse
【RPC框架Hessian二】Hessian 对象序列化和反序列化 bit1129 hessian
任何一个对象从一个JVM传输到另一个JVM，都要经过序列化为二进制数据(或者字符串等其他格式，比如JSON)，然后在反序列化为Java对象，这最后都是通过二进制的数据在不同的JVM之间传输(一般是通过Socket和二进制的数据传输)，本文定义一个比较符合工作中。 1. 定义三个POJO Person类 package com.tom.hes
【Hadoop十四】Hadoop提供的脚本的功能 bit1129 hadoop
1. hadoop-daemon.sh 1.1 启动HDFS ./hadoop-daemon.sh start namenode ./hadoop-daemon.sh start datanode 通过这种逐步启动的方式，比start-all.sh方式少了一个SecondaryNameNode进程，这不影响Hadoop的使用，其实在 Hadoop2.0中，SecondaryNa
中国互联网走在“灰度”上 ronin47 管理灰度
中国互联网走在“灰度”上（转）文/孕峰第一次听说灰度这个词，是任正非说新型管理者所需要的素质。第二次听说是来自马化腾。似乎其他人包括马云也用不同的语言说过类似的意思。灰度这个词所包含的意义和视野是广远的。要理解这个词，可能同样要用“灰度”的心态。灰度的反面，是规规矩矩，清清楚楚，泾渭分明，严谨条理，是决不妥协，不转弯，认死理。黑白分明不是灰度，像彩虹那样
java-51-输入一个矩阵，按照从外向里以顺时针的顺序依次打印出每一个数字。 bylijinnan java
public class PrintMatrixClockwisely { /** * Q51.输入一个矩阵，按照从外向里以顺时针的顺序依次打印出每一个数字。例如：如果输入如下矩阵： 1 2 3 4 5 6 7 8 9
mongoDB 用户管理开窍的石头 mongoDB用户管理
1:添加用户第一次设置用户需要进入admin数据库下设置超级用户（use admin） db.addUsr({user:'useName',pwd:'111111',roles:[readWrite,dbAdmin]}); 第一个参数用户的名字第二个参数
[游戏与生活]玩暗黑破坏神3的一些问题 comsci 生活
暗黑破坏神3是有史以来最让人激动的游戏。。。。但是有几个问题需要我们注意玩这个游戏的时间，每天不要超过一个小时，且每次玩游戏最好在白天结束游戏之后，最好在太阳下面来晒一下身上的暗黑气息，让自己恢复人的生气 &nb
java 二维数组如何存入数据库 cuiyadll java
using System; using System.Linq; using System.Text; using System.Windows.Forms; using System.Xml; using System.Xml.Serialization; using System.IO; namespace WindowsFormsApplication1 {
本地事务和全局事务Local Transaction and Global Transaction(JTA) darrenzhu java spring local global transaction
Configuring Spring and JTA without full Java EE http://spring.io/blog/2011/08/15/configuring-spring-and-jta-without-full-java-ee/ Spring doc -Transaction Management http://docs.spring.io/spri
Linux命令之alias - 设置命令的别名，让 Linux 命令更简练 dcj3sjt126com linux alias
用途说明设置命令的别名。在linux系统中如果命令太长又不符合用户的习惯，那么我们可以为它指定一个别名。虽然可以为命令建立“链接”解决长文件名的问题，但对于带命令行参数的命令，链接就无能为力了。而指定别名则可以解决此类所有问题【1】。常用别名来简化ssh登录【见示例三】，使长命令变短，使常用的长命令行变短，强制执行命令时询问等。常用参数格式：alias 格式：ali
yii2 restful web服务[格式响应] dcj3sjt126com PHP yii2
响应格式当处理一个 RESTful API 请求时，一个应用程序通常需要如下步骤来处理响应格式：确定可能影响响应格式的各种因素，例如媒介类型，语言，版本，等等。这个过程也被称为 content negotiation。资源对象转换为数组，如在 Resources 部分中所描述的。通过 [[yii\rest\Serializer]]
MongoDB索引调优（2）——[十] eksliang mongodb MongoDB索引优化
转载请出自出处：http://eksliang.iteye.com/blog/2178555 一、概述上一篇文档中也说明了，MongoDB的索引几乎与关系型数据库的索引一模一样，优化关系型数据库的技巧通用适合MongoDB，所有这里只讲MongoDB需要注意的地方二、索引内嵌文档可以在嵌套文档的键上建立索引，方式与正常
当滑动到顶部和底部时，实现Item的分离效果的ListView gundumw100 android
拉动ListView，Item之间的间距会变大，释放后恢复原样； package cn.tangdada.tangbang.widget; import android.annotation.TargetApi; import android.content.Context; import android.content.res.TypedArray; import andr
程序员用HTML5制作的爱心树表白动画 ini JavaScript jquery Web html5 css
体验效果：http://keleyi.com/keleyi/phtml/html5/31.htmHTML代码如下： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"><head><meta charset="UTF-8" > <ti
预装windows 8 系统GPT模式的ThinkPad T440改装64位 windows 7旗舰版 kakajw ThinkPad 预装改装 windows 7 windows 8
该教程具有普遍参考性，特别适用于联想的机器，其他品牌机器的处理过程也大同小异。该教程是个人多次尝试和总结的结果，实用性强，推荐给需要的人！缘由小弟最近入手笔记本ThinkPad T440，但是特别不能习惯笔记本出厂预装的Windows 8系统，而且厂商自作聪明地预装了一堆没用的应用软件，消耗不少的系统资源（本本的内存为4G，系统启动完成时，物理内存占用比
Nginx学习笔记 mcj8089 nginx
一、安装nginx 1、在nginx官方网站下载一个包，下载地址是： http://nginx.org/download/nginx-1.4.2.tar.gz 2、WinSCP(ftp上传工
mongodb 聚合查询每天论坛链接点击次数 qiaolevip 每天进步一点点学习永无止境 mongodb 纵观千象
/* 18 */ { "_id" : ObjectId("5596414cbe4d73a327e50274"), "msgType" : "text", "sendTime" : ISODate("2015-07-03T08:01:16.000Z"
java术语（PO/POJO/VO/BO/DAO/DTO） Luob. DAO POJO DTO po VO BO
PO(persistant object) 持久对象在o/r 映射的时候出现的概念,如果没有o/r映射,就没有这个概念存在了.通常对应数据模型(数据库),本身还有部分业务逻辑的处理.可以看成是与数据库中的表相映射的java对象.最简单的PO就是对应数据库中某个表中的一条记录,多个记录可以用PO的集合.PO中应该不包含任何对数据库的操作. VO(value object) 值对象通
算法复杂度 Wuaner Algorithm
Time Complexity & Big-O： http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o http://bigocheatsheet.com/ http://www.sitepoint.com/time-complexity-algorithms/

Mask-rcnn做检测应用

训练代码

参考方案1

参考方案2

你可能感兴趣的:(dl,目标检测)