羊掌门

MASK RCNN学习笔记-训练自己的数据

MASK RCNN学习笔记-训练自己的数据-如何重复训练以前的数据

1、开发环境
2、安装mask-rcnn
3、预训练模型下载
4、标记软件
5、预训练数据集合
6、利用labelme标记图像
7、利用labelme生成图像数据文件
8、训练数据

1、开发环境

mask-rcnn 在windows10 和 linux环境下均可以运行，笔者配置了两台机器，如下配置：
windows10：
显卡：GTX1070 TI 配置的cuda 10.0 和 cudnn8.0，tensorfolw-gpu 版本13.0rc1
ubuntu 18.04：
显卡：GTX1060配置的cuda 9.0 和 cudnn7.0，tensorflow-gpu 版本12.0
实际测试结果：ubuntu 训练速度比 windows10 的速度快很多，大概每运行一步，ubuntu 的时间在0.9s，windows 10 在2s
如果没有显卡，这个项目训练速度及其慢，训练使用内存大约16GB上下

2、安装mask-rcnn

从githhub上面clone项目到本地，仓库地址如下
https://github.com/matterport/Mask_RCNN.git
下载完成之后，安装requirements.txt 里面的依赖项目，如果本机已经安装，则跳过这些项目
建议手动一项一项安装，而不是
		pip3 install requirements.txt
前置条件安装好之后，安装mask-rcnn，执行
		python setup.py install
如果遇到错误，则编辑这个脚本，跳过install_reqs这个检查项目

3、预训练模型下载

预训练模型下载，从github上面下载coco.h5这预先训练好的模型，预先训练好的模型下载页面地址如下：
https://github.com/matterport/Mask_RCNN/releases
预训练的权重的下载地址如下：
https://github.com/matterport/Mask_RCNN/releases/download/v2.1/mask_rcnn_balloon.5
这个权重是气球项目训练的一个结果集合

4、标记软件

标记软件，我使用的是labelme，使用如下命令安装，如果是linux系统，那么使用pip3
pip install labelme
在控制台输入labelme即可打开这个软件，windows版本打开之后如下图1所示

5、预训练数据集合

本文采用生成随机的验证码进行数据测试，生成验证的代码如下pre_data.py，实际上图像在进行训练时在图像预处理的代码可能更复杂一些，这个代码生成100张图片验证码。

from PIL import Image, ImageDraw, ImageFont, ImageFilter
import random
import time

def rnd_char():
    '''
    随机一个字母或者数字
    :return:
    '''
    # 随机一个字母或者数字
    i = random.randint(1,3)
    if i == 1:
        # 随机个数字的十进制ASCII码
        an = random.randint(97, 122)
    elif i == 2:
        # 随机个小写字母的十进制ASCII码
        an = random.randint(65, 90)
    else:
        # 随机个大写字母的十进制ASCII码
        an = random.randint(48, 57)
    # 根据Ascii码转成字符，return回去
    return chr(an)

def rnd_color2():
    '''
      随机颜色，规定一定范围
      :return:
      '''
    return (random.randint(32, 127), random.randint(32, 127), random.randint(32, 127))

def rnd_color():
    '''
    随机颜色，规定一定范围
    :return: 
    '''
    return (random.randint(64, 255), random.randint(64, 255), random.randint(64, 255))

def create_code():
    # 240 x 60:
    width = 60 * 4
    height = 60
    image = Image.new('RGB', (width, height), (192, 192, 192))
    # 创建Font对象:
    font = ImageFont.truetype(r'E:\pycharm\code_of_auth\font\simfang.ttf',36)
    # 创建Draw对象:
    draw = ImageDraw.Draw(image)

    # 填充每个像素:
    for x in range(0, width, 20):
        for y in range(0, height, 10):
            draw.point((x, y), fill=rnd_color())

    # 填充字符
    _str = ""
    # 填入4个随机的数字或字母作为验证码
    for t in range(4):
        c = rnd_char()
        _str = "{}{}".format(_str, c)
        # 随机距离图片上边高度，但至少距离30像素
        h = random.randint(1, height - 30)
        # 宽度的化，每个字符占图片宽度1／4,在加上10个像素空隙
        w = width / 4 * t + 10
        draw.text((w, h), c, font=font, fill=rnd_color2())
        # 实际项目中，会将验证码 保存在数据库，并加上时间字段
        print("验证码生成完毕{}".format(_str))
    t = time.time()
    current_time = int(round(t * 1000))
    save_dir = 'train_data/auth_code_{}.jpg'.format(current_time)
    image.save(save_dir, 'jpeg')

for i in range(100):
    create_code()

生成的图片如图2所示，下图

图2 生成的预训练数据

6、利用labelme标记图像

在终端输入labelme，再打开文件夹，即可定位到标记文件夹，如下图所示，图3

图3 标记图像注意事项

1、在图像进入这一步之前，最好预先处理一次图像
2、图像标记成矩形是在图像中的物体比较容易区分的情况下，最好的情况是能用多边形去标记物体

7、利用labelme生成图像数据文件

标记完成图像之后，将所有json文件拷贝至一个文件夹中，我本地命名为:json_file，批量生成数据文件的代码如下ImagePath.py 和labelme_gen.py 。

ImagePath.py：

import glob
import os

#读取所有图片的路径
def read_image(paths, type):
    imgs = []
    for im in glob.glob(paths+'/*.' + type):
        imgs.append(im)
    return imgs

labelme_gen.py：

import ImagePath as ImagePath
import os

data_path = 'json_file'

flle_list = ImagePath.read_image(data_path, 'json')

for file in flle_list:
    cmd = 'labelme_json_to_dataset ' + file
    os.system(cmd)

8、训练数据

labelme生成的数据文件夹拷贝出来，我命名为labelme_json，准备一个pic文件夹，里面放标记的图片，图像格式png和jpg都可以，还有之前下载的气球模型预训练权重，就可以开始训练了，训练代码文件，train_model.py

import os
import sys
import random
import math
import re
import time
import numpy as np
import cv2
import matplotlib
import matplotlib.pyplot as plt
import tensorflow as tf
from mrcnn.config import Config
#import utils
from mrcnn import model as modellib,utils
from mrcnn import visualize
import yaml
from mrcnn.model import log
from PIL import Image

ROOT_DIR = os.getcwd()
#模型保存目录
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
iter_num=0

COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# 基础设置
dataset_root_path = "E:\\pycharm\\code_of_auth\\"
labelme_json_path = dataset_root_path + 'labelme_json'
img_floder = dataset_root_path + "pic"
# yaml_floder = dataset_root_path
imglist = os.listdir(img_floder)
count = len(imglist)


class ShapesConfig(Config):
    NAME = "shapes"

    GPU_COUNT = 1

    IMAGES_PER_GPU = 1

    NUM_CLASSES = 1 + 36  # background + 10 numbers + 26 char

    # Use smaller anchors because our image and objects are small , 目标的尺寸,自己去测量几个放到这里
    RPN_ANCHOR_SCALES = (40, 50, 60) # anchor side in pixels

    TRAIN_ROIS_PER_IMAGE = 100

    STEPS_PER_EPOCH = 100

    VALIDATION_STEPS = 50

config = ShapesConfig()
config.display()


class TrainDataset(utils.Dataset):

    # 得到该图中有多少个实例（物体）
    def get_obj_index(self, image):
        n = np.max(image)
        return n

    # 解析labelme中得到的yaml文件，从而得到mask每一层对应的实例标签
    def from_yaml_get_class(self, image_id):
        info = self.image_info[image_id]
        with open(info['yaml_path']) as f:
            temp = yaml.load(f.read(),Loader=yaml.FullLoader)
            labels = temp['label_names']
            del labels[0]
            return labels

    # 重新写draw_mask
    def draw_mask(self, num_obj, mask, image, image_id):
        # print("draw_mask-->",image_id)
        # print("self.image_info",self.image_info)
        info = self.image_info[image_id]
        for index in range(num_obj):
            for i in range(info['width']):
                for j in range(info['height']):
                    at_pixel = image.getpixel((i, j))
                    if at_pixel == index + 1:
                        mask[j, i, index] = 1
        return mask

    # 重新写load_shapes，里面包含自己的类别,可以任意添加
    # 并在self.image_info信息中添加了path、mask_path 、yaml_path
    # yaml_pathdataset_root_path = "/tongue_dateset/"
    def load_shapes(self, count, img_floder, imglist, dataset_root_path):
        """Generate the requested number of synthetic images.
        count: number of images to generate.
        height, width: the size of the generated images.
        """
        # Add classes
        self.add_class("shapes", 1, "0")
        self.add_class("shapes", 2, "1")
        self.add_class("shapes", 3, "2")
        self.add_class("shapes", 4, "3")
        self.add_class("shapes", 5, "4")
        self.add_class("shapes", 6, "5")
        self.add_class("shapes", 7, "6")
        self.add_class("shapes", 8, "7")
        self.add_class("shapes", 9, "8")
        self.add_class("shapes", 10, "9")

        self.add_class("shapes", 11, "a")
        self.add_class("shapes", 12, "b")
        self.add_class("shapes", 13, "c")
        self.add_class("shapes", 14, "d")
        self.add_class("shapes", 15, "e")
        self.add_class("shapes", 16, "f")
        self.add_class("shapes", 17, "g")

        self.add_class("shapes", 18, "h")
        self.add_class("shapes", 19, "i")
        self.add_class("shapes", 20, "j")
        self.add_class("shapes", 21, "k")
        self.add_class("shapes", 22, "l")
        self.add_class("shapes", 23, "m")
        self.add_class("shapes", 24, "n")

        self.add_class("shapes", 25, "o")
        self.add_class("shapes", 26, "p")
        self.add_class("shapes", 27, "q")
        self.add_class("shapes", 28, "r")
        self.add_class("shapes", 29, "s")
        self.add_class("shapes", 30, "t")

        self.add_class("shapes", 31, "u")
        self.add_class("shapes", 32, "v")
        self.add_class("shapes", 33, "w")
        self.add_class("shapes", 34, "x")
        self.add_class("shapes", 35, "y")
        self.add_class("shapes", 36, "z")

        for i in range(count):
            # 获取图片宽和高
            print(i)
            filestr = imglist[i].replace(".jpg", "")

            # print(imglist[i],"-->",cv_img.shape[1],"--->",cv_img.shape[0])
            # print("id-->", i, " imglist[", i, "]-->", imglist[i],"filestr-->",filestr)
            # filestr = filestr.split("_")[1]
            mask_path = dataset_root_path + "\\" + filestr + "_json\\label.png"
            yaml_path = dataset_root_path + "\\" + filestr + "_json\\info.yaml"
            img_path = dataset_root_path + "\\" + filestr + "_json\\img.png"
            print(img_path)
            cv_img = cv2.imread(img_path)
            self.add_image("shapes", image_id=i, path=img_floder + "\\" + imglist[i],
                           width=cv_img.shape[1], height=cv_img.shape[0], mask_path=mask_path, yaml_path=yaml_path)

    def load_mask(self, image_id):
        """Generate instance masks for shapes of the given image ID.
        """
        global iter_num
        print("image_id", image_id)
        info = self.image_info[image_id]
        count = 1  # number of object
        img = Image.open(info['mask_path'])
        num_obj = self.get_obj_index(img)
        mask = np.zeros([info['height'], info['width'], num_obj], dtype=np.uint8)
        mask = self.draw_mask(num_obj, mask, img, image_id)
        occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
        for i in range(count - 2, -1, -1):
            mask[:, :, i] = mask[:, :, i] * occlusion
            occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
        labels = []
        labels = self.from_yaml_get_class(image_id)
        labels_form = []
        for i in range(len(labels)):
            if labels[i].find("0") != -1:
                labels_form.append("0")
            elif labels[i].find("1") != -1:
                labels_form.append("1")
            elif labels[i].find("2") != -1:
                labels_form.append("2")
            elif labels[i].find("3") != -1:
                labels_form.append("3")
            elif labels[i].find("4") != -1:
                labels_form.append("4")
            elif labels[i].find("5") != -1:
                labels_form.append("5")
            elif labels[i].find("6") != -1:
                labels_form.append("6")
            elif labels[i].find("7") != -1:
                labels_form.append("7")
            elif labels[i].find("8") != -1:
                labels_form.append("8")
            elif labels[i].find("9") != -1:
                labels_form.append("9")
            elif labels[i].find("a") != -1:
                labels_form.append("a")
            elif labels[i].find("b") != -1:
                labels_form.append("b")
            elif labels[i].find("c") != -1:
                labels_form.append("c")
            elif labels[i].find("d") != -1:
                labels_form.append("d")
            elif labels[i].find("e") != -1:
                labels_form.append("e")
            elif labels[i].find("f") != -1:
                labels_form.append("f")
            elif labels[i].find("g") != -1:
                labels_form.append("g")
            elif labels[i].find("h") != -1:
                labels_form.append("h")
            elif labels[i].find("i") != -1:
                labels_form.append("i")
            elif labels[i].find("j") != -1:
                labels_form.append("j")
            elif labels[i].find("k") != -1:
                labels_form.append("k")
            elif labels[i].find("l") != -1:
                labels_form.append("l")
            elif labels[i].find("m") != -1:
                labels_form.append("m")
            elif labels[i].find("n") != -1:
                labels_form.append("n")
            elif labels[i].find("o") != -1:
                labels_form.append("o")
            elif labels[i].find("p") != -1:
                labels_form.append("p")
            elif labels[i].find("q") != -1:
                labels_form.append("q")
            elif labels[i].find("r") != -1:
                labels_form.append("r")
            elif labels[i].find("s") != -1:
                labels_form.append("s")
            elif labels[i].find("t") != -1:
                labels_form.append("t")
            elif labels[i].find("u") != -1:
                labels_form.append("u")
            elif labels[i].find("v") != -1:
                labels_form.append("v")
            elif labels[i].find("w") != -1:
                labels_form.append("w")
            elif labels[i].find("x") != -1:
                labels_form.append("x")
            elif labels[i].find("y") != -1:
                labels_form.append("y")
            elif labels[i].find("z") != -1:
                labels_form.append("z")

        class_ids = np.array([self.class_names.index(s) for s in labels_form])
        return mask, class_ids.astype(np.int32)

def get_ax(rows=1, cols=1, size=8):
    """Return a Matplotlib Axes array to be used in
    all visualizations in the notebook. Provide a
    central point to control graph sizes.
    Change the default size attribute to control the size
    of rendered images
    """
    _, ax = plt.subplots(rows, cols, figsize=(size * cols, size * rows))
    return ax


# train与val数据集准备
dataset_train = TrainDataset()
dataset_train.load_shapes(count, img_floder, imglist, labelme_json_path)
dataset_train.prepare()
# print("dataset_train-->",dataset_train._image_ids)

dataset_val = TrainDataset()
dataset_val.load_shapes(count, img_floder, imglist, labelme_json_path)
dataset_val.prepare()
# print("dataset_val-->",dataset_val._image_ids)
# Load and display random samples
# image_ids = np.random.choice(dataset_train.image_ids, 4)
# for image_id in image_ids:
#    image = dataset_train.load_image(image_id)
#    mask, class_ids = dataset_train.load_mask(image_id)
#    visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)

# Create model in training mode
model = modellib.MaskRCNN(mode="training", config=config,
                          model_dir=MODEL_DIR)

# Which weights to start with?
init_with = "coco"  # imagenet, coco, or last

if init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
    # Load weights trained on MS COCO, but skip layers that
    # are different due to the different number of classes
    # See README for instructions to download the COCO weights
    # print(COCO_MODEL_PATH)
    model.load_weights(COCO_MODEL_PATH, by_name=True,
                       exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",
                                "mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
    # Load the last model you trained and continue training
    model.load_weights(model.find_last()[1], by_name=True)

# Train the head branches
# Passing layers="heads" freezes all layers except the head
# layers. You can also pass a regular expression to select
# which layers to train by name pattern.
model.train(dataset_train, dataset_val,learning_rate=config.LEARNING_RATE, epochs=100, layers='heads')
# Fine tune all layers
# Passing layers="all" trains all layers. You can also
# pass a regular expression to select which layers to
# train by name pattern.
model.train(dataset_train, dataset_val,learning_rate=config.LEARNING_RATE / 10, epochs=100, layers="all")

一般情况下，需要修改的是，种类，目标物体的尺寸，load_shapes和load_mask函数里面的标签名称，load_shapes里面有个图像格式的隐藏代码，如果图片格式是png则

filestr = imglist[i].replace(".png", "")

如果是打算重新训练数据，那么 model.load_weights 修改成自己的模型文件位置，一般是训练好之后保存的。
9、预测模型，预测模型使用pre_data.py 和 get_result.py 两个文件，get_result.py 是基于原来的代码复制出来的，为的就是改变里面文字的显示颜色。
get_result.py：

"""
Mask R-CNN
Display and Visualization Functions.

Copyright (c) 2017 Matterport, Inc.
Licensed under the MIT License (see LICENSE for details)
Written by Waleed Abdulla
"""

import os
import sys
import random
import itertools
import colorsys

import numpy as np
from skimage.measure import find_contours
import matplotlib.pyplot as plt
from matplotlib import patches,  lines
from matplotlib.patches import Polygon
import IPython.display

# Root directory of the project
ROOT_DIR = os.path.abspath("../")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils


############################################################
#  Visualization
############################################################

def display_images(images, titles=None, cols=4, cmap=None, norm=None,
                   interpolation=None):
    """Display the given set of images, optionally with titles.
    images: list or array of image tensors in HWC format.
    titles: optional. A list of titles to display with each image.
    cols: number of images per row
    cmap: Optional. Color map to use. For example, "Blues".
    norm: Optional. A Normalize instance to map values to colors.
    interpolation: Optional. Image interpolation to use for display.
    """
    titles = titles if titles is not None else [""] * len(images)
    rows = len(images) // cols + 1
    plt.figure(figsize=(14, 14 * rows // cols))
    i = 1
    for image, title in zip(images, titles):
        plt.subplot(rows, cols, i)
        plt.title(title, fontsize=9)
        plt.axis('off')
        plt.imshow(image.astype(np.uint8), cmap=cmap,
                   norm=norm, interpolation=interpolation)
        i += 1
    plt.show()


def random_colors(N, bright=True):
    """
    Generate random colors.
    To get visually distinct colors, generate them in HSV space then
    convert to RGB.
    """
    brightness = 1.0 if bright else 0.7
    hsv = [(i / N, 1, brightness) for i in range(N)]
    colors = list(map(lambda c: colorsys.hsv_to_rgb(*c), hsv))
    random.shuffle(colors)
    return colors


def apply_mask(image, mask, color, alpha=0.5):
    """Apply the given mask to the image.
    """
    for c in range(3):
        image[:, :, c] = np.where(mask == 1,
                                  image[:, :, c] *
                                  (1 - alpha) + alpha * color[c] * 255,
                                  image[:, :, c])
    return image


def display_instances(image, boxes, masks, class_ids, class_names,
                      scores=None, title="",
                      figsize=(16, 16), ax=None,
                      show_mask=True, show_bbox=True,
                      colors=None, captions=None):
    """
    boxes: [num_instance, (y1, x1, y2, x2, class_id)] in image coordinates.
    masks: [height, width, num_instances]
    class_ids: [num_instances]
    class_names: list of class names of the dataset
    scores: (optional) confidence scores for each box
    title: (optional) Figure title
    show_mask, show_bbox: To show masks and bounding boxes or not
    figsize: (optional) the size of the image
    colors: (optional) An array or colors to use with each object
    captions: (optional) A list of strings to use as captions for each object
    """
    # Number of instances
    N = boxes.shape[0]
    if not N:
        print("\n*** No instances to display *** \n")
    else:
        assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]

    # If no axis is passed, create one and automatically call show()
    auto_show = False
    if not ax:
        _, ax = plt.subplots(1, figsize=figsize)
        auto_show = True

    # Generate random colors
    colors = colors or random_colors(N)

    # Show area outside image boundaries.
    height, width = image.shape[:2]
    ax.set_ylim(height + 10, -10)
    ax.set_xlim(-10, width + 10)
    ax.axis('off')
    ax.set_title(title)

    masked_image = image.astype(np.uint32).copy()
    for i in range(N):
        color = colors[i]

        # Bounding box
        if not np.any(boxes[i]):
            # Skip this instance. Has no bbox. Likely lost in image cropping.
            continue
        y1, x1, y2, x2 = boxes[i]
        if show_bbox:
            p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
                                alpha=0.7, linestyle="dashed",
                                edgecolor=color, facecolor='none')
            ax.add_patch(p)

        # Label
        if not captions:
            class_id = class_ids[i]
            score = scores[i] if scores is not None else None
            label = class_names[class_id]
            caption = "{} {:.3f}".format(label, score) if score else label
        else:
            caption = captions[i]
        ax.text(x1, y1 + 8, caption,
                color='w', size=11, backgroundcolor="none")

        # Mask
        mask = masks[:, :, i]
        if show_mask:
            masked_image = apply_mask(masked_image, mask, color)

        # Mask Polygon
        # Pad to ensure proper polygons for masks that touch image edges.
        padded_mask = np.zeros(
            (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8)
        padded_mask[1:-1, 1:-1] = mask
        contours = find_contours(padded_mask, 0.5)
        for verts in contours:
            # Subtract the padding and flip (y, x) to (x, y)
            verts = np.fliplr(verts) - 1
            p = Polygon(verts, facecolor="none", edgecolor=color)
            ax.add_patch(p)
    #return  masked_image
    ax.imshow(masked_image.astype(np.uint8))
    return  plt
    # if auto_show:
    #     plt.show()


def display_differences(image,
                        gt_box, gt_class_id, gt_mask,
                        pred_box, pred_class_id, pred_score, pred_mask,
                        class_names, title="", ax=None,
                        show_mask=True, show_box=True,
                        iou_threshold=0.5, score_threshold=0.5):
    """Display ground truth and prediction instances on the same image."""
    # Match predictions to ground truth
    gt_match, pred_match, overlaps = utils.compute_matches(
        gt_box, gt_class_id, gt_mask,
        pred_box, pred_class_id, pred_score, pred_mask,
        iou_threshold=iou_threshold, score_threshold=score_threshold)
    # Ground truth = green. Predictions = red
    colors = [(0, 1, 0, .8)] * len(gt_match)\
           + [(1, 0, 0, 1)] * len(pred_match)
    # Concatenate GT and predictions
    class_ids = np.concatenate([gt_class_id, pred_class_id])
    scores = np.concatenate([np.zeros([len(gt_match)]), pred_score])
    boxes = np.concatenate([gt_box, pred_box])
    masks = np.concatenate([gt_mask, pred_mask], axis=-1)
    # Captions per instance show score/IoU
    captions = ["" for m in gt_match] + ["{:.2f} / {:.2f}".format(
        pred_score[i],
        (overlaps[i, int(pred_match[i])]
            if pred_match[i] > -1 else overlaps[i].max()))
            for i in range(len(pred_match))]
    # Set title if not provided
    title = title or "Ground Truth and Detections\n GT=green, pred=red, captions: score/IoU"
    # Display
    display_instances(
        image,
        boxes, masks, class_ids,
        class_names, scores, ax=ax,
        show_bbox=show_box, show_mask=show_mask,
        colors=colors, captions=captions,
        title=title)


def draw_rois(image, rois, refined_rois, mask, class_ids, class_names, limit=10):
    """
    anchors: [n, (y1, x1, y2, x2)] list of anchors in image coordinates.
    proposals: [n, 4] the same anchors but refined to fit objects better.
    """
    masked_image = image.copy()

    # Pick random anchors in case there are too many.
    ids = np.arange(rois.shape[0], dtype=np.int32)
    ids = np.random.choice(
        ids, limit, replace=False) if ids.shape[0] > limit else ids

    fig, ax = plt.subplots(1, figsize=(12, 12))
    if rois.shape[0] > limit:
        plt.title("Showing {} random ROIs out of {}".format(
            len(ids), rois.shape[0]))
    else:
        plt.title("{} ROIs".format(len(ids)))

    # Show area outside image boundaries.
    ax.set_ylim(image.shape[0] + 20, -20)
    ax.set_xlim(-50, image.shape[1] + 20)
    ax.axis('off')

    for i, id in enumerate(ids):
        color = np.random.rand(3)
        class_id = class_ids[id]
        # ROI
        y1, x1, y2, x2 = rois[id]
        p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
                              edgecolor=color if class_id else "gray",
                              facecolor='none', linestyle="dashed")
        ax.add_patch(p)
        # Refined ROI
        if class_id:
            ry1, rx1, ry2, rx2 = refined_rois[id]
            p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2,
                                  edgecolor=color, facecolor='none')
            ax.add_patch(p)
            # Connect the top-left corners of the anchor and proposal for easy visualization
            ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color))

            # Label
            label = class_names[class_id]
            ax.text(rx1, ry1 + 8, "{}".format(label),
                    color='w', size=11, backgroundcolor="none")

            # Mask
            m = utils.unmold_mask(mask[id], rois[id]
                                  [:4].astype(np.int32), image.shape)
            masked_image = apply_mask(masked_image, m, color)

    ax.imshow(masked_image)

    # Print stats
    print("Positive ROIs: ", class_ids[class_ids > 0].shape[0])
    print("Negative ROIs: ", class_ids[class_ids == 0].shape[0])
    print("Positive Ratio: {:.2f}".format(
        class_ids[class_ids > 0].shape[0] / class_ids.shape[0]))


# TODO: Replace with matplotlib equivalent?
def draw_box(image, box, color):
    """Draw 3-pixel width bounding boxes on the given image array.
    color: list of 3 int values for RGB.
    """
    y1, x1, y2, x2 = box
    image[y1:y1 + 2, x1:x2] = color
    image[y2:y2 + 2, x1:x2] = color
    image[y1:y2, x1:x1 + 2] = color
    image[y1:y2, x2:x2 + 2] = color
    return image


def display_top_masks(image, mask, class_ids, class_names, limit=4):
    """Display the given image and the top few class masks."""
    to_display = []
    titles = []
    to_display.append(image)
    titles.append("H x W={}x{}".format(image.shape[0], image.shape[1]))
    # Pick top prominent classes in this image
    unique_class_ids = np.unique(class_ids)
    mask_area = [np.sum(mask[:, :, np.where(class_ids == i)[0]])
                 for i in unique_class_ids]
    top_ids = [v[0] for v in sorted(zip(unique_class_ids, mask_area),
                                    key=lambda r: r[1], reverse=True) if v[1] > 0]
    # Generate images and titles
    for i in range(limit):
        class_id = top_ids[i] if i < len(top_ids) else -1
        # Pull masks of instances belonging to the same class.
        m = mask[:, :, np.where(class_ids == class_id)[0]]
        m = np.sum(m * np.arange(1, m.shape[-1] + 1), -1)
        to_display.append(m)
        titles.append(class_names[class_id] if class_id != -1 else "-")
    display_images(to_display, titles=titles, cols=limit + 1, cmap="Blues_r")


def plot_precision_recall(AP, precisions, recalls):
    """Draw the precision-recall curve.

    AP: Average precision at IoU >= 0.5
    precisions: list of precision values
    recalls: list of recall values
    """
    # Plot the Precision-Recall curve
    _, ax = plt.subplots(1)
    ax.set_title("Precision-Recall Curve. AP@50 = {:.3f}".format(AP))
    ax.set_ylim(0, 1.1)
    ax.set_xlim(0, 1.1)
    _ = ax.plot(recalls, precisions)


def plot_overlaps(gt_class_ids, pred_class_ids, pred_scores,
                  overlaps, class_names, threshold=0.5):
    """Draw a grid showing how ground truth objects are classified.
    gt_class_ids: [N] int. Ground truth class IDs
    pred_class_id: [N] int. Predicted class IDs
    pred_scores: [N] float. The probability scores of predicted classes
    overlaps: [pred_boxes, gt_boxes] IoU overlaps of predictions and GT boxes.
    class_names: list of all class names in the dataset
    threshold: Float. The prediction probability required to predict a class
    """
    gt_class_ids = gt_class_ids[gt_class_ids != 0]
    pred_class_ids = pred_class_ids[pred_class_ids != 0]

    plt.figure(figsize=(12, 10))
    plt.imshow(overlaps, interpolation='nearest', cmap=plt.cm.Blues)
    plt.yticks(np.arange(len(pred_class_ids)),
               ["{} ({:.2f})".format(class_names[int(id)], pred_scores[i])
                for i, id in enumerate(pred_class_ids)])
    plt.xticks(np.arange(len(gt_class_ids)),
               [class_names[int(id)] for id in gt_class_ids], rotation=90)

    thresh = overlaps.max() / 2.
    for i, j in itertools.product(range(overlaps.shape[0]),
                                  range(overlaps.shape[1])):
        text = ""
        if overlaps[i, j] > threshold:
            text = "match" if gt_class_ids[j] == pred_class_ids[i] else "wrong"
        color = ("white" if overlaps[i, j] > thresh
                 else "black" if overlaps[i, j] > 0
                 else "grey")
        plt.text(j, i, "{:.3f}\n{}".format(overlaps[i, j], text),
                 horizontalalignment="center", verticalalignment="center",
                 fontsize=9, color=color)

    plt.tight_layout()
    plt.xlabel("Ground Truth")
    plt.ylabel("Predictions")


def draw_boxes(image, boxes=None, refined_boxes=None,
               masks=None, captions=None, visibilities=None,
               title="", ax=None):
    """Draw bounding boxes and segmentation masks with different
    customizations.

    boxes: [N, (y1, x1, y2, x2, class_id)] in image coordinates.
    refined_boxes: Like boxes, but draw with solid lines to show
        that they're the result of refining 'boxes'.
    masks: [N, height, width]
    captions: List of N titles to display on each box
    visibilities: (optional) List of values of 0, 1, or 2. Determine how
        prominent each bounding box should be.
    title: An optional title to show over the image
    ax: (optional) Matplotlib axis to draw on.
    """
    # Number of boxes
    assert boxes is not None or refined_boxes is not None
    N = boxes.shape[0] if boxes is not None else refined_boxes.shape[0]

    # Matplotlib Axis
    if not ax:
        _, ax = plt.subplots(1, figsize=(12, 12))

    # Generate random colors
    colors = random_colors(N)

    # Show area outside image boundaries.
    margin = image.shape[0] // 10
    ax.set_ylim(image.shape[0] + margin, -margin)
    ax.set_xlim(-margin, image.shape[1] + margin)
    ax.axis('off')

    ax.set_title(title)

    masked_image = image.astype(np.uint32).copy()
    for i in range(N):
        # Box visibility
        visibility = visibilities[i] if visibilities is not None else 1
        if visibility == 0:
            color = "gray"
            style = "dotted"
            alpha = 0.5
        elif visibility == 1:
            color = colors[i]
            style = "dotted"
            alpha = 1
        elif visibility == 2:
            color = colors[i]
            style = "solid"
            alpha = 1

        # Boxes
        if boxes is not None:
            if not np.any(boxes[i]):
                # Skip this instance. Has no bbox. Likely lost in cropping.
                continue
            y1, x1, y2, x2 = boxes[i]
            p = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2,
                                  alpha=alpha, linestyle=style,
                                  edgecolor=color, facecolor='none')
            ax.add_patch(p)

        # Refined boxes
        if refined_boxes is not None and visibility > 0:
            ry1, rx1, ry2, rx2 = refined_boxes[i].astype(np.int32)
            p = patches.Rectangle((rx1, ry1), rx2 - rx1, ry2 - ry1, linewidth=2,
                                  edgecolor=color, facecolor='none')
            ax.add_patch(p)
            # Connect the top-left corners of the anchor and proposal
            if boxes is not None:
                ax.add_line(lines.Line2D([x1, rx1], [y1, ry1], color=color))

        # Captions
        if captions is not None:
            caption = captions[i]
            # If there are refined boxes, display captions on them
            if refined_boxes is not None:
                y1, x1, y2, x2 = ry1, rx1, ry2, rx2
            ax.text(x1, y1, caption, size=11, verticalalignment='top',
                    color='w', backgroundcolor="none",
                    bbox={'facecolor': color, 'alpha': 0.5,
                          'pad': 2, 'edgecolor': 'none'})

        # Masks
        if masks is not None:
            mask = masks[:, :, i]
            masked_image = apply_mask(masked_image, mask, color)
            # Mask Polygon
            # Pad to ensure proper polygons for masks that touch image edges.
            padded_mask = np.zeros(
                (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8)
            padded_mask[1:-1, 1:-1] = mask
            contours = find_contours(padded_mask, 0.5)
            for verts in contours:
                # Subtract the padding and flip (y, x) to (x, y)
                verts = np.fliplr(verts) - 1
                p = Polygon(verts, facecolor="none", edgecolor=color)
                ax.add_patch(p)
    ax.imshow(masked_image.astype(np.uint8))


def display_table(table):
    """Display values in a table format.
    table: an iterable of rows, and each row is an iterable of values.
    """
    html = ""
    for row in table:
        row_html = ""
        for col in row:
            row_html += "{:40}".format(str(col))
        html += "" + row_html + ""
    html = "" + html + ""
    IPython.display.display(IPython.display.HTML(html))


def display_weight_stats(model):
    """Scans all the weights in the model and returns a list of tuples
    that contain stats about each weight.
    """
    layers = model.get_trainable_layers()
    table = [["WEIGHT NAME", "SHAPE", "MIN", "MAX", "STD"]]
    for l in layers:
        weight_values = l.get_weights()  # list of Numpy arrays
        weight_tensors = l.weights  # list of TF tensors
        for i, w in enumerate(weight_values):
            weight_name = weight_tensors[i].name
            # Detect problematic layers. Exclude biases of conv layers.
            alert = ""
            if w.min() == w.max() and not (l.__class__.__name__ == "Conv2D" and i == 1):
                alert += "*** dead?"
            if np.abs(w.min()) > 1000 or np.abs(w.max()) > 1000:
                alert += "*** Overflow?"
            # Add row
            table.append([
                weight_name + alert,
                str(w.shape),
                "{:+9.4f}".format(w.min()),
                "{:+10.4f}".format(w.max()),
                "{:+9.4f}".format(w.std()),
            ])
    display_table(table)

forcast.py：

import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
import cv2
import time
from mrcnn.config import Config
from datetime import datetime
import get_result as get_result
import ImagePath as ImagePath

ROOT_DIR = os.getcwd()
# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library

from mrcnn import utils

import mrcnn.model as modellib

MODEL_DIR = os.path.join(ROOT_DIR, "logs")


class ShapesConfig(Config):
    """Configuration for training on the toy shapes dataset.
    Derives from the base Config class and overrides values specific
    to the toy shapes dataset.
    """
    # Give the configuration a recognizable name
    NAME = "shapes"

    # Train on 1 GPU and 8 images per GPU. We can put multiple images on each
    # GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT = 1

    IMAGES_PER_GPU = 1

    NUM_CLASSES = 1 + 36  # background + 10 numbers

    # IMAGE_MIN_DIM = 160
    # IMAGE_MAX_DIM = 256

    # Use smaller anchors because our image and objects are small , 目标的尺寸,自己去测量几个放到这里
    RPN_ANCHOR_SCALES = (40, 50, 60)  # anchor side in pixels

    TRAIN_ROIS_PER_IMAGE = 100

    STEPS_PER_EPOCH = 100

    VALIDATION_STEPS = 50

# import train_tongue
# class InferenceConfig(coco.CocoConfig):
class InferenceConfig(ShapesConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()

config.display()

model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights('logs/shapes20190521T0957/mask_rcnn_shapes_0100.h5', by_name=True)

# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
               'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
# Load a random image from the images folder

data_path = 'test_jpg'

flle_list = ImagePath.read_image(data_path, 'jpg')

for file in flle_list:
    image = skimage.io.imread(file)
    a = datetime.now()
    # Run detection
    results = model.detect([image], verbose=1)
    b = datetime.now()
    # Visualize results
    print("time cost", (b - a).seconds)
    r = results[0]
    masked_image = get_result.display_instances(image, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])
    file = file.replace("test_jpg","test_result")
    masked_image.savefig(file)

print('end predict!')

准备好训练好的模型文件、数据、保存结果的文件夹
（1）、logs/shapes20190521T0957/mask_rcnn_shapes_0100.h5
（2）、test_jpg
（3）、test_result
运行forcast.py即可结果预测，预测的结果如下图所示：

最后，我把代码上传至我的github上面，有兴趣的可以自己下载下来玩：
https://github.com/xiaoyangmoa/code_of_auth.git

你可能感兴趣的:(ai-图像识别)

AI-智能体修炼十万年的狗尾巴草人工智能大数据
什么是AI智能体？「AI智能体」这个术语并没有真正被定义，对智能体究竟是什么也存在很多的争议。AI智能体可以定义为「一个被赋予行动能力的LLM（通常在RAG环境中进行函数调用），以便在环境中对如何执行任务做出高层次的决策。」当前，构建AI智能体主要有以下两种架构方法：**单一智能体：**一个大型模型处理整个任务，并基于其全面的上下文理解做出所有决策和行动。这种方法利用了大型模型的涌现能力，避免了将
《今日AI-人工智能-编程日报》-源自2025年3月21日小亦编辑部人工智能
一、AI编程领域最新动态AI编程工具崛起，程序员职业面临挑战Anthropic首席执行官DarioAmodei预言，未来一年内，90%的代码将由AI生成，传统程序员的工作可能被大幅替代。最新发布的AI编程模型（如Claude3.7、Sonnet3.7）在初级开发评估中表现优异，得分率超过60%，部分模型甚至在全球程序员排名中位列前0.1%。字节跳动的Trae海外版接入Claude3.7和GPT-4
《今日AI-人工智能-编程日报》-源自2025年3月19日小亦编辑部每日AI-人工智能-编程日报人工智能
1.豆包AI编程功能迎来三项重磅升级豆包平台今日宣布其AI编程功能迎来三项重要升级，包括：HTML实时预览：支持用户在编写HTML代码时实时查看网页效果，显著提升前端开发效率，尤其适用于小游戏和网页制作。Python代码直接运行与一键修复：用户可直接运行Python代码，并在出错时一键修复，极大降低了编程门槛，提升了开发效率。生成完整项目：新增生成完整项目的功能，帮助用户快速创建应用程序，缩短开发
图像识别技术与应用课后总结（20）一元钱面包人工智能
图像分割概念图像分割是把图像中不同像素划分到不同类别，预测目标轮廓，属于细粒度分类。比如将图像里不同物体、背景等区分开来，就像把一幅画里的各个元素精准归类。应用场景人像抠图：能精准分离人物和背景，用于图片编辑、影视制作等，比如去除照片背景换背景。医学组织提取：在医学影像（如CT、MRI图像）中分离出不同组织，辅助疾病诊断、手术规划等。遥感图像分析：分析卫星或航空遥感图像时，区分土地、植被、建筑等不
【机器视觉】少量样本图片情况下的图片识别技术方案 yuanpan 机器学习人工智能计算机视觉
在只有少量图片样本的情况下，进行图像识别是一个具有挑战性的任务。以下是一些应对小样本问题的有效方案：1.数据增强（DataAugmentation）通过对现有样本进行各种变换来生成更多的训练数据，例如：几何变换：旋转、缩放、平移、翻转等。颜色变换：调整亮度、对比度、饱和度等。噪声添加：高斯噪声、椒盐噪声等。裁剪和填充：随机裁剪图像的一部分或填充边缘。工具：Keras：ImageDataGenera
近期计算机领域的热点技术 0dayNu1L 云计算量子计算人工智能
随着科技的飞速发展，计算机领域的新技术、新趋势层出不穷。本文将探讨近期计算机领域的几个热点技术趋势，并对它们进行简要的分析和展望。一、人工智能与机器学习人工智能（AI）和机器学习（ML）是近年来计算机领域最为热门的话题之一。AI和ML技术已经广泛应用于图像识别、自然语言处理、智能推荐等领域，并取得了显著的成果。随着技术的不断进步，AI和ML将更深入地渗透到各个行业，为人类社会带来更多便利和效益。在
数据增强：扩充数据集提升模型泛化能力 AI天才研究院计算 AI大模型企业级应用开发实战 ChatGPT 计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍1.1.数据增强的重要性在机器学习领域，模型的泛化能力至关重要。一个泛化能力强的模型能够在未见数据上表现良好，而过拟合的模型则会在训练数据上表现出色，但在新数据上表现糟糕。数据增强是一种有效提升模型泛化能力的技术，它通过对现有数据进行各种变换，人为地扩充数据集，从而增加训练数据的数量和多样性。1.2.数据增强的应用场景数据增强广泛应用于各种机器学习任务中，包括：图像识别:对图像进行旋转
统计机器学习 (Statistical Machine Learning) 原理与代码实例讲解 AGI大模型与大数据研究院 DeepSeek R1 &大数据AI人工智能计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
统计机器学习(StatisticalMachineLearning)原理与代码实例讲解1.背景介绍统计机器学习是现代人工智能和数据科学的核心领域之一。它结合了统计学和计算机科学的理论与方法，通过数据驱动的方式来构建预测模型和决策系统。统计机器学习不仅在学术研究中占据重要地位，还在工业界有广泛应用，如推荐系统、图像识别、自然语言处理等。2.核心概念与联系2.1统计学与机器学习的关系统计学关注数据的收
AI人工智能深度学习算法：搭建可拓展的深度学习模型架构 AI大模型应用之禅 DeepSeek R1 &AI大模型与大数据 java python javascript kotlin golang 架构人工智能
深度学习、模型架构、可拓展性、神经网络、机器学习1.背景介绍深度学习作为人工智能领域最前沿的技术之一，在图像识别、自然语言处理、语音识别等领域取得了突破性的进展。深度学习模型的成功离不开其强大的学习能力和可拓展性。本文将深入探讨深度学习算法的原理、模型架构设计以及可拓展性的关键要素，并通过代码实例和实际应用场景，帮助读者理解如何搭建可拓展的深度学习模型架构。2.核心概念与联系深度学习的核心概念是人
李开复：AI 2.0 时代的价值 AI大模型应用之禅 DeepSeek R1 &AI大模型与大数据 java python javascript kotlin golang 架构人工智能
人工智能，AI2.0，价值创造，伦理挑战，未来趋势1.背景介绍人工智能（AI）技术近年来发展迅速，从语音识别、图像识别到自然语言处理，AI已经渗透到我们生活的方方面面。李开复，作为一位享誉全球的人工智能专家，在《AI2.0时代的价值》一文中，深刻地探讨了AI2.0时代带来的机遇与挑战，以及AI如何为人类创造价值。AI1.0时代主要集中在规则驱动的系统，例如围棋、象棋等游戏的AI。而AI2.0时代则
李开复：AI 2.0 时代的机遇 AGI大模型与大数据研究院 DeepSeek R1 &大数据AI人工智能 java python javascript kotlin golang 架构人工智能
人工智能，深度学习，Transformer，大模型，通用人工智能，AI2.0，应用场景，未来趋势1.背景介绍人工智能（AI）技术近年来发展迅速，从语音识别、图像识别到自然语言处理等领域取得了突破性进展。其中，深度学习作为人工智能的核心技术之一，推动了AI技术的飞速发展。然而，深度学习模型的训练成本高、数据依赖性强、可解释性差等问题仍然制约着AI技术的进一步发展。李开复先生在《AI2.0时代的机遇》
答题卡图像识别需求分析、市场分析和技术实现 weixin_34037977 人工智能开发工具 json
答题卡图像识别需求分析、市场分析和技术实现一、需求分析一、以接口的方式开发此需求：1：接收图片以上传的方式把图片发送到接口。2：识别图片接口接收到图片后，进行图像识别。3：返回数据返回识别后的JSON格式数据。二、答题卡图片识别的具体要求：图片是通过手机、相机、扫描仪等设备拍照而来，其中手机、相机拍出的照片会出现像素低、图像不正、聚焦不清楚等问题；1：图片只要是人眼能看清楚的即可完成识别；2：80
AI大模型从入门到精通，2025终极指南！好卷啊，又不能躺平，只能悄悄卷你们了！大模型教程人工智能大模型训练 LLM 知识库大模型大模型入门大模型学习
什么是AI大模型？AI大模型是指使用大规模数据和强大的计算能力训练出来的人工智能模型。这些模型通常具有高度的准确性和泛化能力，可以应用于各种领域，如自然语言处理、图像识别、语音识别等。为什么要学AI大模型？2024人工智能大模型的技术岗位与能力培养随着人工智能技术的迅速发展和应用，大模型作为其中的重要组成部分，正逐渐成为推动人工智能发展的重要引擎。大模型以其强大的数据处理和模式识别能力，广泛应用于
热门AI创作助手推荐【第一期】量子星澜文心一言 AI写作 chatgpt
星游AI创作助手人工智能在现代科技中的应用非常广泛，涵盖了诸多领域，包括但不限于以下几个方面：1.语音识别和自然语言处理：人工智能技术被广泛应用于语音识别和自然语言处理领域，例如智能助手、翻译系统、语音交互系统等。2.机器学习和数据分析：人工智能的机器学习算法被用于数据分析、预测建模、用户个性化推荐等领域，帮助企业做出更准确的商业决策。3.计算机视觉：人工智能在计算机视觉领域的应用包括图像识别、视
【解锁机器学习：探寻数学基石】游戏乐趣机器学习人工智能
机器学习中的数学基础探秘在当今数字化时代，机器学习无疑是最具影响力和发展潜力的技术领域之一。从图像识别到自然语言处理，从智能推荐系统到自动驾驶，机器学习的应用无处不在，深刻地改变着我们的生活和工作方式。然而，在这看似神奇的机器学习背后，数学作为其坚实的理论基础，起着不可或缺的关键作用。毫不夸张地说，数学是打开机器学习大门的钥匙，是理解和掌握机器学习算法与模型的核心所在。想象一下，机器学习就像是一座
DeepSeek：全栈开发者视角下的AI革命者大富大贵7 程序员知识储备1 程序员知识储备2 程序员知识储备3 人工智能
DeepSeek：全栈开发者视角下的AI革命者写在前面随着人工智能（AI）技术的不断进步，AI已经成为各行各业创新的核心动力。从自动驾驶到智能制造，再到自然语言处理和图像识别，AI正在逐渐渗透并改变着我们的生活和工作方式。DeepSeek，作为AI领域的新兴技术，凭借其独特的技术架构和颠覆性的创新理念，成为了全栈开发者关注的焦点。本文将从全栈开发者的角度出发，详细解析DeepSeek的诞生、技术架
NPU的应用场景：从云端到边缘绿算技术 NPU架构介绍缓存人工智能科技深度学习
NPU的应用场景非常广泛，主要包括以下几个方面：1.云计算与数据中心AI推理服务：在云端提供高效的AI推理服务，例如图像识别、语音识别。模型训练加速：在大规模训练任务中，NPU可以作为加速单元，提升训练效率。2.边缘计算智能摄像头：在安防监控中，NPU可以实时处理视频流，实现目标检测和跟踪。智能音箱：在语音助手中，NPU可以加速语音识别和自然语言处理任务。3.自动驾驶实时感知：NPU可以加速自动驾
《今日AI-人工智能-编程日报》-源自2025年2月28日小亦编辑部每日AI-人工智能-编程日报人工智能
技术突破OpenAI发布GPT-4.5：OpenAI正式推出新一代大模型GPT-4.5，语言理解、情商及逻辑推理能力显著增强，幻觉问题大幅减少，计算效率较GPT-4提升超10倍，已逐步向用户开放试用。国产模型Kimi发布k1.6版本：国产模型Kimi发布的k1.6版本，在编程任务评测中超越GPTo3-mini，首次登顶榜首。其“快思考”架构使响应速度提升至秒级，部署成本下降40%。企业动态贵州广电
《今日AI-人工智能-编程日报》-源于2025年3月11日小亦编辑部每日AI-人工智能-编程日报人工智能
1.AI行业动态1.1Manus通用智能体初成型，开启AIAgent新时代中泰证券发布研报称，首款通用型AI智能体Manus已问世，能够将复杂任务拆解为可执行的步骤链，并在虚拟环境中灵活调用工具，标志着AI从“Reasoner”走向“Agent”阶段。Manus的成功引发了开源复现潮，DeepSeek模型已被整合到OWL项目中，并在GAIA基准测试中表现接近Manus。1.2DeepSeek-R2
谷歌Gemini 3大模型发布，AI领域再掀波澜！广拓科技人工智能
在人工智能的浩瀚宇宙中，每一次重大突破都如同一颗璀璨的新星，照亮我们对未来的想象。而近期，谷歌发布的Gemini3大模型，无疑是其中最为耀眼的存在，它在AI领域激起的波澜，迅速蔓延至全球科技圈，引发了广泛关注与热烈讨论。随着AI技术的迅猛发展，我们已经见证了众多令人惊叹的创新成果。从智能语音助手到图像识别技术，从自动驾驶汽车到医疗诊断辅助系统，AI正以前所未有的速度改变着我们的生活和工作方式。在这
目标检测中衡量模型速度和精度的指标：FPS和mAP asdfg1258963 目标检测_ai 目标检测人工智能
“FPS”和“mAP”分别衡量了模型的速度和精度。FPS（FramesPerSecond）定义：FPS是“每秒传输帧数”的缩写，用于衡量计算机视觉系统（如目标检测、图像识别等）的实时性能。它表示系统每秒钟能够处理的图像或视频帧的数量。重要性：在实时应用中，如自动驾驶、视频监控等，FPS是一个关键指标。高FPS意味着系统能够快速处理输入的图像数据，实现实时响应。计算方式：FPS可以通过以下公式计算：
python实现KNN算法的手写数字识别：深入解析与完整项目流程快撑死的鱼 Python算法精解算法
随着人工智能和机器学习的快速发展，图像识别技术在多个领域得到广泛应用。而手写数字识别作为图像识别的典型场景之一，已经成为研究者和开发者学习、应用机器学习算法的经典项目。本文将深入解析如何使用Python编程语言，结合KNN（K-最近邻）算法实现手写数字识别系统。文章不仅介绍了算法的核心原理，还从用户交互、图像处理、数据预处理等多个角度对整个项目进行了全方位的讲解。读者通过本文，可以全面掌握手写数字
《今日AI-人工智能-编程日报》小亦工作室人工智能
1.AI行业动态1.1Manus通用智能体初成型，开启AIAgent新时代中泰证券发布研报称，首款通用型AI智能体Manus已问世，能够将复杂任务拆解为可执行的步骤链，并在虚拟环境中灵活调用工具，标志着AI从“Reasoner”走向“Agent”阶段。Manus的成功引发了开源复现潮，DeepSeek模型已被整合到OWL项目中，并在GAIA基准测试中表现接近Manus。1.2DeepSeek-R2
深度学习项目--基于DenseNet网络的“乳腺癌图像识别”，准确率90%+，pytorch复现羊小猪~~ 深度学习网络 pytorch 人工智能 python 机器学习分类
本文为365天深度学习训练营中的学习记录博客原作者：K同学啊前言如果说最经典的神经网络，ResNet肯定是一个，从ResNet发布后，很多人做了修改，denseNet网络无疑是最成功的一个，它采用密集型连接，将通道数连接在一起；本文是基于上一篇复现DenseNet121模型，做一个乳腺癌图像识别，效果还行，准确率0.9+;CNN经典网络之“DenseNet”简介，源码研究与复现(pytorch)：
人工智能概念 zhangpeng455547940 计算机人工智能
机器学习、深度学习、大模型机器学习提供框架，使得系统可以从数据中学习算法：线性回归、逻辑回归、支持向量机、决策树、随机森林、K近邻算法深度学习是实现这一目标的工具，模仿人脑，使用多层神经网络进行学习算法：多层感知器、卷积神经网络、循环神经网络、长短期记忆网络大模型指参数量巨大的深度学习模型人工智能应用：自然语言处理、图像识别与生成、语音识别、政务与企业服务...
DeepSeek在供热行业中的应用杨航 AI 人工智能深度学习 python 机器学习算法
目录引言1.1DeepSeek技术概述1.2供暖行业业务挑战1.3DeepSeek在供暖行业的应用前景DeepSeek技术基础2.1深度学习与机器学习2.2自然语言处理（NLP）2.3图像识别与处理2.4数据挖掘与分析供暖行业应用场景3.1设备监控与维护3.1.1设备状态监控3.1.2故障预测与诊断3.1.3维护计划优化3.2能源管理与优化3.2.1能耗数据分析3.2.2热负荷预测3.2.3节能优
图像识别技术与应用超帅的好吧笔记
第一节课这节课了解了这门专业的就业职位：工资是怎么样的岗位职责和任职要求看到了人类工业文明的演变了解了人工智能的研究、开发、模拟、延伸、理论、方法和技术看到了生活方式的转变比如智能语音闹钟控制系统、自动驾驶和人脸识别考勤智能购物、医疗日常生活的智能比如指纹、淘宝、抖音还能用软件看到天气的好坏了解了典型训练和机器学习中的关键组件机器学习中的关键组件包含：数据模型目标函数优化算法这节课学习了第一节剩下
图像识别技术与应用课后总结（18）一元钱面包人工智能
·YOLO-V3RetinaNet系列，YOLO-V3在不同变体（如YOLOV3-320、YOLOV3-416等）下，在推理时间和精度上有不同的表现，展示了其在速度和准确性上的平衡。YOLO-V3的改进点网络结构：相比之前版本，YOLO-V3的网络结构进行了优化，使其更适合小目标检测。特征处理：对特征的处理更加细致，通过融入多持续特征图信息来预测不同规格的物体。先验框：先验框更加丰富，有3种sca
Chebykan wx 文章阅读やっはろ深度学习
文献筛选[1]神经网络：全面基础[2]通过sigmoid函数的超层叠近似[3]多层前馈网络是通用近似器[5]注意力是你所需要的[6]深度残差学习用于图像识别[7]视觉化神经网络的损失景观[8]牙齿模具点云补全通过数据增强和混合RL-GAN[9]强化学习：一项调查[10]使用PySR和SymbolicRegression.jl的科学可解释机器学习[11]Z.Liu,Y.Wang,S.Vaidya,F
机器学习模型-从线性回归到神经网络 Earth explosion 机器学习线性回归神经网络
在当今的数据驱动世界中，机器学习模型是许多应用程序的核心。无论是推荐系统、图像识别，还是自动驾驶汽车，机器学习技术都在背后发挥着重要作用。在这篇文章中，我们将探索几种基础的机器学习模型，并了解它们的基本原理和应用场景。1.线性回归基本原理线性回归是最简单的机器学习模型之一。它旨在找到一个最佳拟合线来预测目标变量（通常是连续值）。线性回归假设输入变量和输出变量之间存在线性关系，其数学表达式为：[y=
log4j对象改变日志级别 3213213333332132 java log4j level log4j对象名称日志级别
log4j对象改变日志级别可批量的改变所有级别，或是根据条件改变日志级别。 log4j配置文件： log4j.rootLogger=ERROR,FILE,CONSOLE,EXECPTION #log4j.appender.FILE=org.apache.log4j.RollingFileAppender log4j.appender.FILE=org.apache.l
elk+redis 搭建nginx日志分析平台 ronin47 elasticsearch kibana logstash
elk+redis 搭建nginx日志分析平台 logstash,elasticsearch,kibana 怎么进行nginx的日志分析呢？首先，架构方面，nginx是有日志文件的，它的每个请求的状态等都有日志文件进行记录。其次，需要有个队列，redis的l
Yii2设置时区 dcj3sjt126com PHP timezone yii2
时区这东西，在开发的时候，你说重要吧，也还好，毕竟没它也能正常运行，你说不重要吧，那就纠结了。特别是linux系统，都TMD差上几小时，你能不痛苦吗？win还好一点。有一些常规方法，是大家目前都在采用的1、php.ini中的设置，这个就不谈了，2、程序中公用文件里设置，date_default_timezone_set一下时区3、或者。。。自己写时间处理函数，在遇到时间的时候，用这个函数处理（比较
js实现前台动态添加文本框，后台获取文本框内容 171815164 文本框
<%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://w
持续集成工具 g21121 持续集成
持续集成是什么？我们为什么需要持续集成？持续集成带来的好处是什么？什么样的项目需要持续集成？... 持续集成(Continuous integration ,简称CI)，所谓集成可以理解为将互相依赖的工程或模块合并成一个能单独运行
数据结构哈希表(hash)总结永夜-极光数据结构
1.什么是hash 来源于百度百科: Hash，一般翻译做“散列”，也有直接音译为“哈希”的，就是把任意长度的输入，通过散列算法，变换成固定长度的输出，该输出就是散列值。这种转换是一种压缩映射，也就是，散列值的空间通常远小于输入的空间，不同的输入可能会散列成相同的输出，所以不可能从散列值来唯一的确定输入值。简单的说就是一种将任意长度的消息压缩到某一固定长度的消息摘要的函数。
乱七八糟程序员是怎么炼成的
eclipse中的jvm字节码查看插件地址： http://andrei.gmxhome.de/eclipse/ 安装该地址的outline 插件后重启，打开window下的view下的bytecode视图 http://andrei.gmxhome.de/eclipse/ jvm博客： http://yunshen0909.iteye.com/blog/2
职场人伤害了“上司” 怎样弥补 aijuans 职场
由于工作中的失误，或者平时不注意自己的言行“伤害”、“得罪”了自己的上司，怎么办呢？　　在职业生涯中这种问题尽量不要发生。下面提供了一些解决问题的建议：　　一、利用一些轻松的场合表示对他的尊重　　即使是开明的上司也很注重自己的权威，都希望得到下属的尊重，所以当你与上司冲突后，最好让不愉快成为过去，你不妨在一些轻松的场合，比如会餐、联谊活动等，向上司问个好，敬下酒，表示你对对方的尊重，
深入浅出url编码 antonyup_2006 应用服务器浏览器 servlet weblogic IE
出处：http://blog.csdn.net/yzhz 杨争 http://blog.csdn.net/yzhz/archive/2007/07/03/1676796.aspx 一、问题：编码问题是JAVA初学者在web开发过程中经常会遇到问题，网上也有大量相关的
建表后创建表的约束关系和增加表的字段百合不是茶标的约束关系增加表的字段
下面所有的操作都是在表建立后操作的,主要目的就是熟悉sql的约束,约束语句的万能公式 1,增加字段(student表中增加姓名字段) alter table 增加字段的表名 add 增加的字段名增加字段的数据类型 alter table student add name varchar2(10); &nb
Uploadify 3.2 参数属性、事件、方法函数详解 bijian1013 JavaScript uploadify
一.属性属性名称默认值说明 auto true 设置为true当选择文件后就直接上传了，为false需要点击上传按钮才上传。 buttonClass ” 按钮样式 buttonCursor ‘hand’ 鼠标指针悬停在按钮上的样子 buttonImage null 浏览按钮的图片的路
精通Oracle10编程SQL(16)使用LOB对象 bijian1013 oracle 数据库 plsql
/* *使用LOB对象 */ --LOB(Large Object)是专门用于处理大对象的一种数据类型，其所存放的数据长度可以达到4G字节 --CLOB/NCLOB用于存储大批量字符数据，BLOB用于存储大批量二进制数据，而BFILE则存储着指向OS文件的指针 /* *综合实例 */ --建立表空间 --#指定区尺寸为128k,如不指定，区尺寸默认为64k CR
【Resin一】Resin服务器部署web应用 bit1129 resin
工作中，在Resin服务器上部署web应用，通常有如下三种方式：配置多个web-app 配置多个http id 为每个应用配置一个propeties、xml以及sh脚本文件配置多个web-app 在resin.xml中,可以为一个host配置多个web-app <cluster id="app&q
red5简介及基础知识白糖_ 基础
简介 Red5的主要功能和Macromedia公司的FMS类似，提供基于Flash的流媒体服务的一款基于Java的开源流媒体服务器。它由Java语言编写，使用RTMP作为流媒体传输协议，这与FMS完全兼容。它具有流化FLV、MP3文件，实时录制客户端流为FLV文件，共享对象，实时视频播放、Remoting等功能。用Red5替换FMS后,客户端不用更改可正
angular.fromJson boyitech AngularJS AngularJS 官方API AngularJS API
angular.fromJson 描述: 把Json字符串转为对象使用方法: angular.fromJson(json); 参数详解: Param Type Details json string JSON 字符串返回值: 对象, 数组, 字符串或者是一个数字示例: <!DOCTYPE HTML> <h
java-颠倒一个句子中的词的顺序。比如： I am a student颠倒后变成：student a am I bylijinnan java
public class ReverseWords { /** * 题目：颠倒一个句子中的词的顺序。比如： I am a student颠倒后变成：student a am I.词以空格分隔。 * 要求： * 1.实现速度最快,移动最少 * 2.不能使用String的方法如split,indexOf等等。 * 解答：两次翻转。 */ publ
web实时通讯 Chen.H Web 浏览器 socket 脚本
关于web实时通讯，做一些监控软件。由web服务器组件从消息服务器订阅实时数据，并建立消息服务器到所述web服务器之间的连接，web浏览器利用从所述web服务器下载到web页面的客户端代理与web服务器组件之间的socket连接，建立web浏览器与web服务器之间的持久连接；利用所述客户端代理与web浏览器页面之间的信息交互实现页面本地更新，建立一条从消息服务器到web浏览器页面之间的消息通路
[基因与生物]远古生物的基因可以嫁接到现代生物基因组中吗? comsci 生物
大家仅仅把我说的事情当作一个IT行业的笑话来听吧..没有其它更多的意思如果我们把大自然看成是一位伟大的程序员,专门为地球上的生态系统编制基因代码,并创造出各种不同的生物来,那么6500万年前的程序员开发的代码,是否兼容现代派的程序员的代码和架构呢?
oracle 外部表 daizj oracle 外部表 external tables
oracle外部表是只允许只读访问，不能进行DML操作，不能创建索引，可以对外部表进行的查询，连接，排序，创建视图和创建同义词操作。 you can select, join, or sort external table data. You can also create views and synonyms for external tables. Ho
aop相关的概念及配置 daysinsun AOP
切面(Aspect): 通常在目标方法执行前后需要执行的方法（如事务、日志、权限），这些方法我们封装到一个类里面，这个类就叫切面。连接点（joinpoint） spring里面的连接点指需要切入的方法，通常这个joinpoint可以作为一个参数传入到切面的方法里面（非常有用的一个东西）。通知（Advice）通知就是切面里面方法的具体实现，分为前置、后置、最终、异常环
初一上学期难记忆单词背诵第二课 dcj3sjt126com english word
middle 中间的，中级的 well 喔，那么；好吧 phone 电话，电话机 policeman 警察 ask 问 take 拿到；带到 address 地址 glad 高兴的，乐意的 why 为什么 China 中国 family 家庭 grandmother (外)祖母 grandfather (外)祖父 wife 妻子 husband 丈夫 da
Linux日志分析常用命令 dcj3sjt126com linux log
1.查看文件内容 cat -n 显示行号 2.分页显示 more Enter 显示下一行空格显示下一页 F 显示下一屏 B 显示上一屏 less /get 查询"get"字符串并高亮显示 3.显示文件尾 tail -f 不退出持续显示 -n 显示文件最后n行 4.显示头文件 head -n 显示文件开始n行 5.内容排序 sort -n 按照
JSONP 原理分析 fantasy2005 JavaScript jsonp jsonp 跨域
转自 http://www.nowamagic.net/librarys/veda/detail/224 JavaScript是一种在Web开发中经常使用的前端动态脚本技术。在JavaScript中，有一个很重要的安全性限制，被称为“Same-Origin Policy”（同源策略）。这一策略对于JavaScript代码能够访问的页面内容做了很重要的限制，即JavaScript只能访问与包含它的
使用connect by进行级联查询 234390216 oracle 查询父子 Connect by 级联
使用connect by进行级联查询 connect by可以用于级联查询，常用于对具有树状结构的记录查询某一节点的所有子孙节点或所有祖辈节点。来看一个示例，现假设我们拥有一个菜单表t_menu，其中只有三个字段：
一个不错的能将HTML表格导出为excel,pdf等的jquery插件 jackyrong jquery插件
发现一个老外写的不错的jquery插件，可以实现将HTML 表格导出为excel,pdf等格式，地址在： https://github.com/kayalshri/ 下面看个例子，实现导出表格到excel,pdf <html> <head> <title>Export html table to excel an
UI设计中我们为什么需要设计动效 lampcy UI UI设计
关于Unity3D中的Shader的知识首先先解释下Unity3D的Shader，Unity里面的Shaders是使用一种叫ShaderLab的语言编写的，它同微软的FX文件或者NVIDIA的CgFX有些类似。传统意义上的vertex shader和pixel shader还是使用标准的Cg/HLSL 编程语言编写的。因此Unity文档里面的Shader，都是指用ShaderLab编写的代码，
如何禁止页面缓存 nannan408 html jsp cache
禁止页面使用缓存~ ------------------------------------------------ jsp:页面no cache： response.setHeader("Pragma","No-cache"); response.setHeader("Cache-Control","no-cach
以代码的方式管理quartz定时任务的暂停、重启、删除、添加等 Everyday都不同定时任务管理 spring-quartz
【前言】在项目的管理功能中，对定时任务的管理有时会很常见。因为我们不能指望只在配置文件中配置好定时任务就行了，因为如果要控制定时任务的 “暂停” 呢？暂停之后又要在某个时间点 “重启” 该定时任务呢？或者说直接 “删除” 该定时任务呢？要改变某定时任务的触发时间呢？ “添加” 一个定时任务对于系统的使用者而言，是不太现实的，因为一个定时任务的处理逻辑他是不
EXT实例 tntxia ext
（1）增加一个按钮 JSP: <%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%> <% String path = request.getContextPath(); Stri
数学学习在计算机研究领域的作用和重要性 xjnine Math
最近一直有师弟师妹和朋友问我数学和研究的关系，研一要去学什么数学课。毕竟在清华，衡量一个研究生最重要的指标之一就是paper,而没有数学，是肯定上不了世界顶级的期刊和会议的，这在计算机学界尤其重要！你会发现，不论哪个领域有价值的东西，都一定离不开数学！在这样一个信息时代，当google已经让世界没有秘密的时候，一种卓越的数学思维，绝对可以成为你的核心竞争力. 无奈本人实在见地

MASK RCNN学习笔记-训练自己的数据