苏源流

Caffe上用SSD训练和测试自己的数据

我的根目录$caffe_root为/home/gpu/ljy/caffe

一、运行SSD示例代码

1.到https://github.com/weiliu89/caffe.git下载caffe-ssd代码，是一个caffe文件夹

2.参考已经配好的caffe目录下的makefile.config修改￥caffe_root下的makefile.config.

3.在$caffe_root下打开命令行终端，输入以下命令

make -j8
make py
make test -j8
make runtest -j8
编译完成
4.下载VGG_ILSVRC_16_layers_fc_reduced.caffemodel预训练模型，放到$caffe_root/models/VGG下。（没有VGG文件夹就建一个）
  下载数据集VOCtest_06_Nov-2007.tar等三个压缩包放在$caffe_root/data下，并解压。
5.修改./data/VOC0712/create_list.sh里面的路径为自己的路径，修改./data/VOC0712/create_data.sh，本文如下图所示：（VOC0712文件夹可能没有，那就从网上下一个）

 6.命令行切换到$caffe_root并执行上面两个脚本,直接命令行输入就是执行
 7.训练，命令行输入下面：
  python examples/ssd/ssd_pascal.py
 或者下载训练好的模型
 8.测试

A.python examples/ssd/score_ssd_pascal.py
这个要先改里面的gpu个数，输出是分数

B.python examples/ssd/plot_detections.py
输出是是视频的标注...
C. ./.build_release/examples/ssd/ssd_detect.bin models/VGGNet/VOC0712/SSD_300x300/deploy.prototxt models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300_iter_120000.caffemodel /home/gpu/ljy/caffe/data/ljy_test/TestData/pictures.txt
这是单张图片的测试，C++版本的，其中ctrl+h可以查看隐藏文件夹，最后那个pictures.txt是待测试文件夹的路径列表，如下图：

测试结果如下：（暂时还不知道输出的都是什么东西，可能是类别、置信度和位置吧）

D 单张图片测试，python版本

点开ssd_detect.ipynb，复制并保存为ssd_detect.py，然后修改里面的路径（包括$caffe_root和测试图片的路径），并在最后加上plt.show()

然后命令行运行该代码即可

二、训练并测试自己的数据

1.生成训练和测试数据

我们自己的数据基本是jpeg或者其他图片格式的，而caffe输入的一般是LMDB的数据，所以我们要进行转换。我们转换的方法是

A.将图像用工具进行标注(工具这里先不介绍)，得到txt标注文件

B.将txt文件和图片转换成VOC格式（用脚本）

C.将VOC格式转换为LMDB格式，利用SSD示例代码提供的转换脚本。

 (1) 在 $caffe_root/data/VOCdevkit目录下创建ljy_test目录，该目录中存放自己转换完成的VOC数据集；
 (2) $CAFFE_ROOT/examples目录下创建ljy_test目录；
 (3) $CAFFE_ROOT/data目录下创建ljy_test目录，同时将data/VOC0712下的create_list.sh,create_data.sh,labelmap_voc.prototxt
这三个文件copy到ljy_test目录下，分别重命名为create_list_ljy_test.sh,create_data_ljy_test.sh, labelmap_ljy_test.prototxt
 (4) 对上面新生成的两个create文件进行修改，主要修改是将VOC0712相关的信息替换成ljy_test
 修改后的两个文件分别如下：
   
  然后修改labelmap_indoor.prototxt，将该文件中的类别修改成和自己的数据集相匹配，注意需要保留一个label 0 , background类别

完成上面步骤的修改后，可以开始LMDB数据数据的制作，在$CAFFE_ROOT目录下分别运行：

　　./data/ljy_test/create_list_indoor.sh

　　./data/ljy_test/create_data_indoor.sh

　　命令执行完毕后，可以在$CAFFE_ROOT/examples/ljy_test目录下查看转换完成的LMDB数据数据。

2.训练

A.将预训练好的模型放在$CAFFE_ROOT/models/VGGNet下（我们这里在运行SSD示例代码的4已经放过了，可以省略）

B.将$caffe_root/examples/ssd/ssd_pascal.py拷贝到自己的文件夹$caffe_root/examples/ljy_test/下，并重命名为ssd_pascal_ljy.py

C.修改ssd_pascal_ljy.py为自己的各个路径，其中要在$caffe_root/models/VGGNet/下建立ljy_test文件夹，修改如下：

D.执行训练代码。在$caffe_root下打开命令行，并输入

python examples/ljy_test/ssd_pascal_ljy.py
等待训练就可以了...
有可能遇到loss=nan的情况，这个待议，正常情况下是下面酱紫的：

3.测试

A.c++版本的测试

跟上面ssd示例测试的差不多，改一下路径即可

B.python版本的测试

同最上面

4.参考：http://blog.csdn.net/u014696921/article/details/53353896

https://github.com/weiliu89/caffe.git

__________________________________

转

SSD框架训练自己的数据集

2016年11月26日 19:08:03 2014wzy 阅读数：22176更多

个人分类： SSD

SSD demo中详细介绍了如何在VOC数据集上使用SSD进行物体检测的训练和验证。
本文介绍如何使用SSD实现对自己数据集的训练和验证过程，内容包括：

1 数据集的标注
2 数据集的转换
3 使用SSD如何训练
4 使用SSD如何测试
1 数据集的标注　
　　数据的标注使用BBox-Label-Tool工具，该工具使用python实现，使用简单方便。修改后的工具支持多label的标签标注。
该工具生成的标签格式是:
object_number
className x1min y1min x1max y1max
classname x2min y2min x2max y2max
...
1.1 labelTool工具的使用说明
　　BBox-Label-Tool工具实现较简单，原始的git版本使用起来有一些小问题，进行了简单的修改，修改后的版本

#-------------------------------------------------------------------------------
# Name:        Object bounding box label tool
# Purpose:     Label object bboxes for ImageNet Detection data
# Author:      Qiushi
# Created:     06/06/2014

#
#-------------------------------------------------------------------------------
from __future__ import division
from Tkinter import *
import tkMessageBox
from PIL import Image, ImageTk
import os
import glob
import random

# colors for the bboxes
COLORS = ['red', 'blue', 'yellow', 'pink', 'cyan', 'green', 'black']
# image sizes for the examples
SIZE = 256, 256

classLabels=['mat', 'door', 'sofa', 'chair', 'table', 'bed', 'ashcan', 'shoe']

class LabelTool():
    def __init__(self, master):
        # set up the main frame
        self.parent = master
        self.parent.title("LabelTool")
        self.frame = Frame(self.parent)
        self.frame.pack(fill=BOTH, expand=1)
        self.parent.resizable(width = False, height = False)

        # initialize global state
        self.imageDir = ''
        self.imageList= []
        self.egDir = ''
        self.egList = []
        self.outDir = ''
        self.cur = 0
        self.total = 0
        self.category = 0
        self.imagename = ''
        self.labelfilename = ''
        self.tkimg = None

        # initialize mouse state
        self.STATE = {}
        self.STATE['click'] = 0
        self.STATE['x'], self.STATE['y'] = 0, 0

        # reference to bbox
        self.bboxIdList = []
        self.bboxId = None
        self.bboxList = []
        self.hl = None
        self.vl = None
        self.currentClass = ''

        # ----------------- GUI stuff ---------------------
        # dir entry & load
        self.label = Label(self.frame, text = "Image Dir:")
        self.label.grid(row = 0, column = 0, sticky = E)
        self.entry = Entry(self.frame)
        self.entry.grid(row = 0, column = 1, sticky = W+E)
        self.ldBtn = Button(self.frame, text = "Load", command = self.loadDir)
        self.ldBtn.grid(row = 0, column = 2, sticky = W+E)

        # main panel for labeling
        self.mainPanel = Canvas(self.frame, cursor='tcross')
        self.mainPanel.bind("", self.mouseClick)
        self.mainPanel.bind("", self.mouseMove)
        self.parent.bind("", self.cancelBBox)  # press  to cancel current bbox
        self.parent.bind("s", self.cancelBBox)
        self.parent.bind("a", self.prevImage) # press 'a' to go backforward
        self.parent.bind("d", self.nextImage) # press 'd' to go forward
        self.mainPanel.grid(row = 1, column = 1, rowspan = 4, sticky = W+N)

        # showing bbox info & delete bbox
        self.lb1 = Label(self.frame, text = 'Bounding boxes:')
        self.lb1.grid(row = 1, column = 2,  sticky = W+N)
        self.listbox = Listbox(self.frame, width = 22, height = 12)
        self.listbox.grid(row = 2, column = 2, sticky = N)
        self.btnDel = Button(self.frame, text = 'Delete', command = self.delBBox)
        self.btnDel.grid(row = 3, column = 2, sticky = W+E+N)
        self.btnClear = Button(self.frame, text = 'ClearAll', command = self.clearBBox)
        self.btnClear.grid(row = 4, column = 2, sticky = W+E+N)
        
        #select class type
        self.classPanel = Frame(self.frame)
        self.classPanel.grid(row = 5, column = 1, columnspan = 10, sticky = W+E)
        label = Label(self.classPanel, text = 'class:')
        label.grid(row = 5, column = 1,  sticky = W+N)
       
        self.classbox = Listbox(self.classPanel,  width = 4, height = 2)
        self.classbox.grid(row = 5,column = 2)
        for each in range(len(classLabels)):
            function = 'select' + classLabels[each]
            print classLabels[each]
            btnMat = Button(self.classPanel, text = classLabels[each], command = getattr(self, function))
            btnMat.grid(row = 5, column = each + 3)
        
        # control panel for image navigation
        self.ctrPanel = Frame(self.frame)
        self.ctrPanel.grid(row = 6, column = 1, columnspan = 2, sticky = W+E)
        self.prevBtn = Button(self.ctrPanel, text='<< Prev', width = 10, command = self.prevImage)
        self.prevBtn.pack(side = LEFT, padx = 5, pady = 3)
        self.nextBtn = Button(self.ctrPanel, text='Next >>', width = 10, command = self.nextImage)
        self.nextBtn.pack(side = LEFT, padx = 5, pady = 3)
        self.progLabel = Label(self.ctrPanel, text = "Progress:     /    ")
        self.progLabel.pack(side = LEFT, padx = 5)
        self.tmpLabel = Label(self.ctrPanel, text = "Go to Image No.")
        self.tmpLabel.pack(side = LEFT, padx = 5)
        self.idxEntry = Entry(self.ctrPanel, width = 5)
        self.idxEntry.pack(side = LEFT)
        self.goBtn = Button(self.ctrPanel, text = 'Go', command = self.gotoImage)
        self.goBtn.pack(side = LEFT)

        # example pannel for illustration
        self.egPanel = Frame(self.frame, border = 10)
        self.egPanel.grid(row = 1, column = 0, rowspan = 5, sticky = N)
        self.tmpLabel2 = Label(self.egPanel, text = "Examples:")
        self.tmpLabel2.pack(side = TOP, pady = 5)
        self.egLabels = []
        for i in range(3):
            self.egLabels.append(Label(self.egPanel))
            self.egLabels[-1].pack(side = TOP)

        # display mouse position
        self.disp = Label(self.ctrPanel, text='')
        self.disp.pack(side = RIGHT)

        self.frame.columnconfigure(1, weight = 1)
        self.frame.rowconfigure(10, weight = 1)

        # for debugging
##        self.setImage()
##        self.loadDir()

    def loadDir(self, dbg = False):
        if not dbg:
            s = self.entry.get()
            self.parent.focus()
            self.category = int(s)
        else:
            s = r'D:\workspace\python\labelGUI'
##        if not os.path.isdir(s):
##            tkMessageBox.showerror("Error!", message = "The specified dir doesn't exist!")
##            return
        # get image list
        self.imageDir = os.path.join(r'./Images', '%d' %(self.category))
        self.imageList = glob.glob(os.path.join(self.imageDir, '*.jpg'))
        if len(self.imageList) == 0:
            print 'No .JPEG images found in the specified dir!'
            return   

      # set up output dir
        self.outDir = os.path.join(r'./Labels', '%d' %(self.category))
        if not os.path.exists(self.outDir):
            os.mkdir(self.outDir)
        
        labeledPicList = glob.glob(os.path.join(self.outDir, '*.txt'))
        
        for label in labeledPicList:
            data = open(label, 'r')
            if '0\n' == data.read():
                data.close()
                continue
            data.close()
            picture = label.replace('Labels', 'Images').replace('.txt', '.jpg')
            if picture in self.imageList:
                self.imageList.remove(picture)
        # default to the 1st image in the collection
        self.cur = 1
        self.total = len(self.imageList)
        self.loadImage()
        print '%d images loaded from %s' %(self.total, s)

    def loadImage(self):
        # load image
        imagepath = self.imageList[self.cur - 1]
        self.img = Image.open(imagepath)
        self.imgSize = self.img.size
        self.tkimg = ImageTk.PhotoImage(self.img)
        self.mainPanel.config(width = max(self.tkimg.width(), 400), height = max(self.tkimg.height(), 400))
        self.mainPanel.create_image(0, 0, image = self.tkimg, anchor=NW)
        self.progLabel.config(text = "%04d/%04d" %(self.cur, self.total))

        # load labels
        self.clearBBox()
        self.imagename = os.path.split(imagepath)[-1].split('.')[0]
        labelname = self.imagename + '.txt'
        self.labelfilename = os.path.join(self.outDir, labelname)
        bbox_cnt = 0
        if os.path.exists(self.labelfilename):
            with open(self.labelfilename) as f:
                for (i, line) in enumerate(f):
                    if i == 0:
                        bbox_cnt = int(line.strip())
                        continue
                    tmp = [int(t.strip()) for t in line.split()]
##                    print tmp
                    self.bboxList.append(tuple(tmp))
                    tmpId = self.mainPanel.create_rectangle(tmp[0], tmp[1], \
                                                            tmp[2], tmp[3], \
                                                            width = 2, \
                                                            outline = COLORS[(len(self.bboxList)-1) % len(COLORS)])
                    self.bboxIdList.append(tmpId)
                    self.listbox.insert(END, '(%d, %d) -> (%d, %d)' %(tmp[0], tmp[1], tmp[2], tmp[3]))
                    self.listbox.itemconfig(len(self.bboxIdList) - 1, fg = COLORS[(len(self.bboxIdList) - 1) % len(COLORS)])

    def saveImage(self):
        with open(self.labelfilename, 'w') as f:
            f.write('%d\n' %len(self.bboxList))
            for bbox in self.bboxList:
                f.write(' '.join(map(str, bbox)) + '\n')
        print 'Image No. %d saved' %(self.cur)


    def mouseClick(self, event):
        if self.STATE['click'] == 0:
            self.STATE['x'], self.STATE['y'] = event.x, event.y
            #self.STATE['x'], self.STATE['y'] = self.imgSize[0], self.imgSize[1]
        else:
            x1, x2 = min(self.STATE['x'], event.x), max(self.STATE['x'], event.x)
            y1, y2 = min(self.STATE['y'], event.y), max(self.STATE['y'], event.y)
            if x2 > self.imgSize[0]:
                x2 = self.imgSize[0]
            if y2 > self.imgSize[1]:
                y2 = self.imgSize[1]                
            self.bboxList.append((self.currentClass, x1, y1, x2, y2))
            self.bboxIdList.append(self.bboxId)
            self.bboxId = None
            self.listbox.insert(END, '(%d, %d) -> (%d, %d)' %(x1, y1, x2, y2))
            self.listbox.itemconfig(len(self.bboxIdList) - 1, fg = COLORS[(len(self.bboxIdList) - 1) % len(COLORS)])
        self.STATE['click'] = 1 - self.STATE['click']

    def mouseMove(self, event):
        self.disp.config(text = 'x: %d, y: %d' %(event.x, event.y))
        if self.tkimg:
            if self.hl:
                self.mainPanel.delete(self.hl)
            self.hl = self.mainPanel.create_line(0, event.y, self.tkimg.width(), event.y, width = 2)
            if self.vl:
                self.mainPanel.delete(self.vl)
            self.vl = self.mainPanel.create_line(event.x, 0, event.x, self.tkimg.height(), width = 2)
        if 1 == self.STATE['click']:
            if self.bboxId:
                self.mainPanel.delete(self.bboxId)
            self.bboxId = self.mainPanel.create_rectangle(self.STATE['x'], self.STATE['y'], \
                                                            event.x, event.y, \
                                                            width = 2, \
                                                            outline = COLORS[len(self.bboxList) % len(COLORS)])

    def cancelBBox(self, event):
        if 1 == self.STATE['click']:
            if self.bboxId:
                self.mainPanel.delete(self.bboxId)
                self.bboxId = None
                self.STATE['click'] = 0

    def delBBox(self):
        sel = self.listbox.curselection()
        if len(sel) != 1 :
            return
        idx = int(sel[0])
        self.mainPanel.delete(self.bboxIdList[idx])
        self.bboxIdList.pop(idx)
        self.bboxList.pop(idx)
        self.listbox.delete(idx)

    def clearBBox(self):
        for idx in range(len(self.bboxIdList)):
            self.mainPanel.delete(self.bboxIdList[idx])
        self.listbox.delete(0, len(self.bboxList))
        self.bboxIdList = []
        self.bboxList = []
        
    def selectmat(self):
        self.currentClass = 'mat'
        self.classbox.delete(0,END)
        self.classbox.insert(0, 'mat')
        self.classbox.itemconfig(0,fg = COLORS[0])
    
    def selectdoor(self):
        self.currentClass = 'door'    
        self.classbox.delete(0,END)    
        self.classbox.insert(0, 'door')
        self.classbox.itemconfig(0,fg = COLORS[0])
    
    def selectsofa(self):
        self.currentClass = 'sofa'    
        self.classbox.delete(0,END)    
        self.classbox.insert(0, 'sofa')
        self.classbox.itemconfig(0,fg = COLORS[0])
        
    def selectchair(self):
        self.currentClass = 'chair'    
        self.classbox.delete(0,END)    
        self.classbox.insert(0, 'chair')
        self.classbox.itemconfig(0,fg = COLORS[0])
        
    def selecttable(self):
        self.currentClass = 'table'    
        self.classbox.delete(0,END)    
        self.classbox.insert(0, 'table')
        self.classbox.itemconfig(0,fg = COLORS[0])
        
    def selectbed(self):
        self.currentClass = 'bed'
        self.classbox.delete(0,END)    
        self.classbox.insert(0, 'bed')
        self.classbox.itemconfig(0,fg = COLORS[0])
        
    def selectashcan(self):
        self.currentClass = 'ashcan'    
        self.classbox.delete(0,END)    
        self.classbox.insert(0, 'ashcan')
        self.classbox.itemconfig(0,fg = COLORS[0])
        
    def selectshoe(self):
        self.currentClass = 'shoe'    
        self.classbox.delete(0,END)    
        self.classbox.insert(0, 'shoe')
        self.classbox.itemconfig(0,fg = COLORS[0])    

    def prevImage(self, event = None):
        self.saveImage()
        if self.cur > 1:
            self.cur -= 1
            self.loadImage()

    def nextImage(self, event = None):
        self.saveImage()
        if self.cur < self.total:
            self.cur += 1
            self.loadImage()

    def gotoImage(self):
        idx = int(self.idxEntry.get())
        if 1 <= idx and idx <= self.total:
            self.saveImage()
            self.cur = idx
            self.loadImage()

##    def setImage(self, imagepath = r'test2.png'):
##        self.img = Image.open(imagepath)
##        self.tkimg = ImageTk.PhotoImage(self.img)
##        self.mainPanel.config(width = self.tkimg.width())
##        self.mainPanel.config(height = self.tkimg.height())
##        self.mainPanel.create_image(0, 0, image = self.tkimg, anchor=NW)

if __name__ == '__main__':
    root = Tk()
    tool = LabelTool(root)
    root.mainloop()

　　使用方法：

　　　　 (1) 在BBox-Label-Tool/Images目录下创建保存图片的目录，目录以数字命名(BBox-Label-Tool/Images/1), 然后将待标注的图片copy到1这个目录下;

　　　　 (2) 在BBox-Label-Tool目录下执行命令 python main.py

　　　　 (3) 在工具界面上, Image Dir 框中输入需要标记的目录名(比如 1), 然后点击load按钮, 工具自动将Images/1目录下的图片加载进来;

　　　　　　需要说明一下, 如果目录中的图片已经标注过,点击load时不会被重新加载进来.

　　　　 (4) 该工具支持多类别标注, 画bounding boxs框标定之前,需要先选定类别,然后再画框.

　　　　 (5) 一张图片标注完后, 点击Next>>按钮, 标注下一张图片, 图片label成功后,会在BBox-Label-Tool/Labels对应的目录下生成与图片文件名对应的label文件.

2 数据集的转换

　　caffe训练使用LMDB格式的数据，ssd框架中提供了voc数据格式转换成LMDB格式的脚本。
所以实践中先将BBox-Label-Tool标注的数据转换成voc数据格式，然后再转换成LMDB格式。

2.1 voc数据格式

(1)Annotations中保存的是xml格式的label信息



    VOC2007
    1.jpg
    
        My Database
        VOC2007
        flickr
        NULL
    
    
        NULL
        idaneel
    
    
        320
        240
        3
    
    0

(2)ImageSet目录下的Main目录里存放的是用于表示训练的图片集和测试的图片集

(3)JPEGImages目录下存放所有图片集

(4)label目录下保存的是BBox-Label-Tool工具标注好的bounding box坐标文件，
该目录下的文件就是待转换的label标签文件。

2.2 Label转换成VOC数据格式

BBox-Label-Tool工具标注好的bounding box坐标文件转换成VOC数据格式的形式.
具体的转换过程包括了两个步骤：
（1）将BBox-Label-Tool下的txt格式保存的bounding box信息转换成VOC数据格式下以xml方式表示；
（2）生成用于训练的数据集和用于测试的数据集。
用python实现了上述两个步骤的换转。
createXml.py  完成txt到xml的转换；  执行脚本./createXml.py

#!/usr/bin/env python

import os
import sys
import cv2
from itertools import islice
from xml.dom.minidom import Document

labels='label'
imgpath='JPEGImages/'
xmlpath_new='Annotations/'
foldername='VOC2007'


def insertObject(doc, datas):
    obj = doc.createElement('object')
    name = doc.createElement('name')
    name.appendChild(doc.createTextNode(datas[0]))
    obj.appendChild(name)
    pose = doc.createElement('pose')
    pose.appendChild(doc.createTextNode('Unspecified'))
    obj.appendChild(pose)
    truncated = doc.createElement('truncated')
    truncated.appendChild(doc.createTextNode(str(0)))
    obj.appendChild(truncated)
    difficult = doc.createElement('difficult')
    difficult.appendChild(doc.createTextNode(str(0)))
    obj.appendChild(difficult)
    bndbox = doc.createElement('bndbox')
    
    xmin = doc.createElement('xmin')
    xmin.appendChild(doc.createTextNode(str(datas[1])))
    bndbox.appendChild(xmin)
    
    ymin = doc.createElement('ymin')                
    ymin.appendChild(doc.createTextNode(str(datas[2])))
    bndbox.appendChild(ymin)                
    xmax = doc.createElement('xmax')                
    xmax.appendChild(doc.createTextNode(str(datas[3])))
    bndbox.appendChild(xmax)                
    ymax = doc.createElement('ymax')    
    if  '\r' == str(datas[4])[-1] or '\n' == str(datas[4])[-1]:
        data = str(datas[4])[0:-1]
    else:
        data = str(datas[4])
    ymax.appendChild(doc.createTextNode(data))
    bndbox.appendChild(ymax)
    obj.appendChild(bndbox)                
    return obj

def create():
    for walk in os.walk(labels):
        for each in walk[2]:
            fidin=open(walk[0] + '/'+ each,'r')
            objIndex = 0
            for data in islice(fidin, 1, None):        
                objIndex += 1
                data=data.strip('\n')
                datas = data.split(' ')
                if 5 != len(datas):
                    print 'bounding box information error'
                    continue
                pictureName = each.replace('.txt', '.jpg')
                imageFile = imgpath + pictureName
                img = cv2.imread(imageFile)
                imgSize = img.shape
                if 1 == objIndex:
                    xmlName = each.replace('.txt', '.xml')
                    f = open(xmlpath_new + xmlName, "w")
                    doc = Document()
                    annotation = doc.createElement('annotation')
                    doc.appendChild(annotation)
                    
                    folder = doc.createElement('folder')
                    folder.appendChild(doc.createTextNode(foldername))
                    annotation.appendChild(folder)
                    
                    filename = doc.createElement('filename')
                    filename.appendChild(doc.createTextNode(pictureName))
                    annotation.appendChild(filename)
                    
                    source = doc.createElement('source')                
                    database = doc.createElement('database')
                    database.appendChild(doc.createTextNode('My Database'))
                    source.appendChild(database)
                    source_annotation = doc.createElement('annotation')
                    source_annotation.appendChild(doc.createTextNode(foldername))
                    source.appendChild(source_annotation)
                    image = doc.createElement('image')
                    image.appendChild(doc.createTextNode('flickr'))
                    source.appendChild(image)
                    flickrid = doc.createElement('flickrid')
                    flickrid.appendChild(doc.createTextNode('NULL'))
                    source.appendChild(flickrid)
                    annotation.appendChild(source)
                    
                    owner = doc.createElement('owner')
                    flickrid = doc.createElement('flickrid')
                    flickrid.appendChild(doc.createTextNode('NULL'))
                    owner.appendChild(flickrid)
                    name = doc.createElement('name')
                    name.appendChild(doc.createTextNode('idaneel'))
                    owner.appendChild(name)
                    annotation.appendChild(owner)
                    
                    size = doc.createElement('size')
                    width = doc.createElement('width')
                    width.appendChild(doc.createTextNode(str(imgSize[1])))
                    size.appendChild(width)
                    height = doc.createElement('height')
                    height.appendChild(doc.createTextNode(str(imgSize[0])))
                    size.appendChild(height)
                    depth = doc.createElement('depth')
                    depth.appendChild(doc.createTextNode(str(imgSize[2])))
                    size.appendChild(depth)
                    annotation.appendChild(size)
                    
                    segmented = doc.createElement('segmented')
                    segmented.appendChild(doc.createTextNode(str(0)))
                    annotation.appendChild(segmented)            
                    annotation.appendChild(insertObject(doc, datas))
                else:
                    annotation.appendChild(insertObject(doc, datas))
            try:
                f.write(doc.toprettyxml(indent = '    '))
                f.close()
                fidin.close()
            except:
                pass
   
          
if __name__ == '__main__':
    create()

createTest.py 生成训练集和测试集标识文件；执行脚本

./createTest.py %startID% %endID% %testNumber%

#!/usr/bin/env python
import os
import sys
import random
try:
    start = int(sys.argv[1])
    end = int(sys.argv[2])
    test = int(sys.argv[3])
    allNum = end-start+1
except:
    print 'Please input picture range'
    print './createTest.py 1 1500 500'
    os._exit(0)
b_list = range(start,end)
blist_webId = random.sample(b_list, test)
blist_webId = sorted(blist_webId) 
allFile = []
testFile = open('ImageSets/Main/test.txt', 'w')
trainFile = open('ImageSets/Main/trainval.txt', 'w')
for i in range(allNum):
    allFile.append(i+1)
for test in blist_webId:
    allFile.remove(test)
    testFile.write(str(test) + '\n')   
for train in allFile:
    trainFile.write(str(train) + '\n')
testFile.close()
trainFile.close()

说明：由于BBox-Label-Tool实现相对简单，该工具每次只能对一个类别进行打标签，所以转换脚本

每一次也是对一个类别进行数据的转换，这个问题后续需要优化改进。

优化后的BBox-Label-Tool工具，支持多类别标定，生成的label文件中增加了类别名称信息。

使用时修改classLabels，改写成自己的类别, 修改后的工具代码参见1.1中的main.py

2.3 VOC数据转换成LMDB数据

　　SSD提供了VOC数据到LMDB数据的转换脚本 data/VOC0712/create_list.sh 和 ./data/VOC0712/create_data.sh，这两个脚本是完全针对VOC0712目录下的数据进行的转换。
　　实现中为了不破坏VOC0712目录下的数据内容，针对我们自己的数据集，修改了上面这两个脚本，
将脚本中涉及到VOC0712的信息替换成我们自己的目录信息。
在处理我们的数据集时，将VOC0712替换成indoor。
具体的步骤如下：
　　(1) 在 $HOME/data/VOCdevkit目录下创建indoor目录，该目录中存放自己转换完成的VOC数据集；
　　(2) $CAFFE_ROOT/examples目录下创建indoor目录；
        (3) $CAFFE_ROOT/data目录下创建indoor目录，同时将data/VOC0712下的create_list.sh,create_data.sh,labelmap_voc.prototxt
这三个文件copy到indoor目录下，分别重命名为create_list_indoor.sh,create_data_indoor.sh, labelmap_indoor.prototxt
　　(4)对上面新生成的两个create文件进行修改，主要修改是将VOC0712相关的信息替换成indoor
　　修改后的这两个文件分别为：

#!/bin/bash

root_dir=$HOME/data/VOCdevkit/
sub_dir=ImageSets/Main
bash_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

for dataset in trainval test    
do
  dst_file=$bash_dir/$dataset.txt
  if [ -f $dst_file ]
  then
    rm -f $dst_file
  fi
  for name in indoor
  do
    if [[ $dataset == "test" && $name == "VOC2012" ]]
    then
      continue
    fi
    echo "Create list for $name $dataset..."
    dataset_file=$root_dir/$name/$sub_dir/$dataset.txt

    img_file=$bash_dir/$dataset"_img.txt"
    cp $dataset_file $img_file
    sed -i "s/^/$name\/JPEGImages\//g" $img_file
    sed -i "s/$/.jpg/g" $img_file

    label_file=$bash_dir/$dataset"_label.txt"
    cp $dataset_file $label_file
    sed -i "s/^/$name\/Annotations\//g" $label_file
    sed -i "s/$/.xml/g" $label_file

    paste -d' ' $img_file $label_file >> $dst_file

    rm -f $label_file
    rm -f $img_file
  done
  # Generate image name and size infomation.
  if [ $dataset == "test" ]
  then
    $bash_dir/../../build/tools/get_image_size $root_dir $dst_file $bash_dir/$dataset"_name_size.txt"
  fi

  # Shuffle trainval file.
  if [ $dataset == "trainval" ]
  then
    rand_file=$dst_file.random
    cat $dst_file | perl -MList::Util=shuffle -e 'print shuffle();' > $rand_file
    mv $rand_file $dst_file
  fi
done

cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
root_dir=$cur_dir/../..

cd $root_dir

redo=1
data_root_dir="$HOME/data/VOCdevkit"
dataset_name="indoor"
mapfile="$root_dir/data/$dataset_name/labelmap_indoor.prototxt"
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0

extra_cmd="--encode-type=jpg --encoded"
if [ $redo ]
then
  extra_cmd="$extra_cmd --redo"
fi
for subset in test trainval
do
  python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir $root_dir/data/$dataset_name/$subset.txt $data_root_dir/$dataset_name/$db/$dataset_name"_"$subset"_"$db examples/$dataset_name
done

        (5)修改labelmap_indoor.prototxt，将该文件中的类别修改成和自己的数据集相匹配，注意需要保留一个label 0 , background类别

item {
  name: "none_of_the_above"
  label: 0
  display_name: "background"
}
item {
  name: "door"
  label: 1
  display_name: "door"
}

　　完成上面步骤的修改后，可以开始LMDB数据数据的制作，在$CAFFE_ROOT目录下分别运行：

　　./data/indoor/create_list_indoor.sh

　　./data/indoor/create_data_indoor.sh

　　命令执行完毕后，可以在$CAFFE_ROOT/indoor目录下查看转换完成的LMDB数据数据。

3 使用SSD进行自己数据集的训练

训练时使用ssd demo中提供的预训练好的VGGnet model : VGG_ILSVRC_16_layers_fc_reduced.caffemodel
将该模型保存到$CAFFE_ROOT/models/VGGNet下。
将ssd_pascal.py copy一份 ssd_pascal_indoor.py文件， 根据自己的数据集修改ssd_pascal_indoor.py
主要修改点：
 （1）train_data和test_data修改成指向自己的数据集LMDB
　　　train_data = "examples/indoor/indoor_trainval_lmdb"
            test_data = "examples/indoor/indoor_test_lmdb"
（2） num_test_image该变量修改成自己数据集中测试数据的数量
（3）num_classes 该变量修改成自己数据集中 标签类别数量数 + 1

针对我的数据集，ssd_pascal_indoor.py的内容为：

from __future__ import print_function
import caffe
from caffe.model_libs import *
from google.protobuf import text_format

import math
import os
import shutil
import stat
import subprocess
import sys

# Add extra layers on top of a "base" network (e.g. VGGNet or Inception).
def AddExtraLayers(net, use_batchnorm=True):
    use_relu = True

    # Add additional convolutional layers.
    from_layer = net.keys()[-1]
    # TODO(weiliu89): Construct the name using the last layer to avoid duplication.
    out_layer = "conv6_1"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 1, 0, 1)

    from_layer = out_layer
    out_layer = "conv6_2"
    ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 512, 3, 1, 2)

    for i in xrange(7, 9):
      from_layer = out_layer
      out_layer = "conv{}_1".format(i)
      ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1)

      from_layer = out_layer
      out_layer = "conv{}_2".format(i)
      ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 1, 2)

    # Add global pooling layer.
    name = net.keys()[-1]
    net.pool6 = L.Pooling(net[name], pool=P.Pooling.AVE, global_pooling=True)

    return net


### Modify the following parameters accordingly ###
# The directory which contains the caffe code.
# We assume you are running the script at the CAFFE_ROOT.
caffe_root = os.getcwd()

# Set true if you want to start training right after generating all files.
run_soon = True
# Set true if you want to load from most recently saved snapshot.
# Otherwise, we will load from the pretrain_model defined below.
resume_training = True
# If true, Remove old model files.
remove_old_models = False

# The database file for training data. Created by data/VOC0712/create_data.sh
train_data = "examples/indoor/indoor_trainval_lmdb"
# The database file for testing data. Created by data/VOC0712/create_data.sh
test_data = "examples/indoor/indoor_test_lmdb"
# Specify the batch sampler.
resize_width = 300
resize_height = 300
resize = "{}x{}".format(resize_width, resize_height)
batch_sampler = [
        {
                'sampler': {
                        },
                'max_trials': 1,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.1,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.3,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.5,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.7,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'min_jaccard_overlap': 0.9,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        {
                'sampler': {
                        'min_scale': 0.3,
                        'max_scale': 1.0,
                        'min_aspect_ratio': 0.5,
                        'max_aspect_ratio': 2.0,
                        },
                'sample_constraint': {
                        'max_jaccard_overlap': 1.0,
                        },
                'max_trials': 50,
                'max_sample': 1,
        },
        ]
train_transform_param = {
        'mirror': True,
        'mean_value': [104, 117, 123],
        'resize_param': {
                'prob': 1,
                'resize_mode': P.Resize.WARP,
                'height': resize_height,
                'width': resize_width,
                'interp_mode': [
                        P.Resize.LINEAR,
                        P.Resize.AREA,
                        P.Resize.NEAREST,
                        P.Resize.CUBIC,
                        P.Resize.LANCZOS4,
                        ],
                },
        'emit_constraint': {
            'emit_type': caffe_pb2.EmitConstraint.CENTER,
            }
        }
test_transform_param = {
        'mean_value': [104, 117, 123],
        'resize_param': {
                'prob': 1,
                'resize_mode': P.Resize.WARP,
                'height': resize_height,
                'width': resize_width,
                'interp_mode': [P.Resize.LINEAR],
                },
        }

# If true, use batch norm for all newly added layers.
# Currently only the non batch norm version has been tested.
use_batchnorm = False
# Use different initial learning rate.
if use_batchnorm:
    base_lr = 0.0004
else:
    # A learning rate for batch_size = 1, num_gpus = 1.
    base_lr = 0.00004

# Modify the job name if you want.
job_name = "SSD_{}".format(resize)
# The name of the model. Modify it if you want.
model_name = "VGG_VOC0712_{}".format(job_name)

# Directory which stores the model .prototxt file.
save_dir = "models/VGGNet/VOC0712/{}".format(job_name)
# Directory which stores the snapshot of models.
snapshot_dir = "models/VGGNet/VOC0712/{}".format(job_name)
# Directory which stores the job script and log file.
job_dir = "jobs/VGGNet/VOC0712/{}".format(job_name)
# Directory which stores the detection results.
output_result_dir = "{}/data/VOCdevkit/results/VOC2007/{}/Main".format(os.environ['HOME'], job_name)

# model definition files.
train_net_file = "{}/train.prototxt".format(save_dir)
test_net_file = "{}/test.prototxt".format(save_dir)
deploy_net_file = "{}/deploy.prototxt".format(save_dir)
solver_file = "{}/solver.prototxt".format(save_dir)
# snapshot prefix.
snapshot_prefix = "{}/{}".format(snapshot_dir, model_name)
# job script path.
job_file = "{}/{}.sh".format(job_dir, model_name)

# Stores the test image names and sizes. Created by data/VOC0712/create_list.sh
name_size_file = "data/indoor/test_name_size.txt"
# The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.
pretrain_model = "models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"
# Stores LabelMapItem.
label_map_file = "data/indoor/labelmap_indoor.prototxt"

# MultiBoxLoss parameters.
num_classes = 2
share_location = True
background_label_id=0
train_on_diff_gt = True
normalization_mode = P.Loss.VALID
code_type = P.PriorBox.CENTER_SIZE
neg_pos_ratio = 3.
loc_weight = (neg_pos_ratio + 1.) / 4.
multibox_loss_param = {
    'loc_loss_type': P.MultiBoxLoss.SMOOTH_L1,
    'conf_loss_type': P.MultiBoxLoss.SOFTMAX,
    'loc_weight': loc_weight,
    'num_classes': num_classes,
    'share_location': share_location,
    'match_type': P.MultiBoxLoss.PER_PREDICTION,
    'overlap_threshold': 0.5,
    'use_prior_for_matching': True,
    'background_label_id': background_label_id,
    'use_difficult_gt': train_on_diff_gt,
    'do_neg_mining': True,
    'neg_pos_ratio': neg_pos_ratio,
    'neg_overlap': 0.5,
    'code_type': code_type,
    }
loss_param = {
    'normalization': normalization_mode,
    }

# parameters for generating priors.
# minimum dimension of input image
min_dim = 300
# conv4_3 ==> 38 x 38
# fc7 ==> 19 x 19
# conv6_2 ==> 10 x 10
# conv7_2 ==> 5 x 5
# conv8_2 ==> 3 x 3
# pool6 ==> 1 x 1
mbox_source_layers = ['conv4_3', 'fc7', 'conv6_2', 'conv7_2', 'conv8_2', 'pool6']
# in percent %
min_ratio = 20
max_ratio = 95
step = int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))
min_sizes = []
max_sizes = []
for ratio in xrange(min_ratio, max_ratio + 1, step):
  min_sizes.append(min_dim * ratio / 100.)
  max_sizes.append(min_dim * (ratio + step) / 100.)
min_sizes = [min_dim * 10 / 100.] + min_sizes
max_sizes = [[]] + max_sizes
aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]]
# L2 normalize conv4_3.
normalizations = [20, -1, -1, -1, -1, -1]
# variance used to encode/decode prior bboxes.
if code_type == P.PriorBox.CENTER_SIZE:
  prior_variance = [0.1, 0.1, 0.2, 0.2]
else:
  prior_variance = [0.1]
flip = True
clip = True

# Solver parameters.
# Defining which GPUs to use.
gpus = "0"
gpulist = gpus.split(",")
num_gpus = len(gpulist)

# Divide the mini-batch to different GPUs.
batch_size = 4
accum_batch_size = 32
iter_size = accum_batch_size / batch_size
solver_mode = P.Solver.CPU
device_id = 0
batch_size_per_device = batch_size
if num_gpus > 0:
  batch_size_per_device = int(math.ceil(float(batch_size) / num_gpus))
  iter_size = int(math.ceil(float(accum_batch_size) / (batch_size_per_device * num_gpus)))
  solver_mode = P.Solver.GPU
  device_id = int(gpulist[0])

if normalization_mode == P.Loss.NONE:
  base_lr /= batch_size_per_device
elif normalization_mode == P.Loss.VALID:
  base_lr *= 25. / loc_weight
elif normalization_mode == P.Loss.FULL:
  # Roughly there are 2000 prior bboxes per image.
  # TODO(weiliu89): Estimate the exact # of priors.
  base_lr *= 2000.

# Which layers to freeze (no backward) during training.
freeze_layers = ['conv1_1', 'conv1_2', 'conv2_1', 'conv2_2']

# Evaluate on whole test set.
num_test_image = 800
test_batch_size = 1
test_iter = num_test_image / test_batch_size

solver_param = {
    # Train parameters
    'base_lr': base_lr,
    'weight_decay': 0.0005,
    'lr_policy': "step",
    'stepsize': 40000,
    'gamma': 0.1,
    'momentum': 0.9,
    'iter_size': iter_size,
    'max_iter': 60000,
    'snapshot': 40000,
    'display': 10,
    'average_loss': 10,
    'type': "SGD",
    'solver_mode': solver_mode,
    'device_id': device_id,
    'debug_info': False,
    'snapshot_after_train': True,
    # Test parameters
    'test_iter': [test_iter],
    'test_interval': 10000,
    'eval_type': "detection",
    'ap_version': "11point",
    'test_initialization': False,
    }

# parameters for generating detection output.
det_out_param = {
    'num_classes': num_classes,
    'share_location': share_location,
    'background_label_id': background_label_id,
    'nms_param': {'nms_threshold': 0.45, 'top_k': 400},
    'save_output_param': {
        'output_directory': output_result_dir,
        'output_name_prefix': "comp4_det_test_",
        'output_format': "VOC",
        'label_map_file': label_map_file,
        'name_size_file': name_size_file,
        'num_test_image': num_test_image,
        },
    'keep_top_k': 200,
    'confidence_threshold': 0.01,
    'code_type': code_type,
    }

# parameters for evaluating detection results.
det_eval_param = {
    'num_classes': num_classes,
    'background_label_id': background_label_id,
    'overlap_threshold': 0.5,
    'evaluate_difficult_gt': False,
    'name_size_file': name_size_file,
    }

### Hopefully you don't need to change the following ###
# Check file.
check_if_exist(train_data)
check_if_exist(test_data)
check_if_exist(label_map_file)
check_if_exist(pretrain_model)
make_if_not_exist(save_dir)
make_if_not_exist(job_dir)
make_if_not_exist(snapshot_dir)

# Create train net.
net = caffe.NetSpec()
net.data, net.label = CreateAnnotatedDataLayer(train_data, batch_size=batch_size_per_device,
        train=True, output_label=True, label_map_file=label_map_file,
        transform_param=train_transform_param, batch_sampler=batch_sampler)

VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,
    dropout=False, freeze_layers=freeze_layers)

AddExtraLayers(net, use_batchnorm)

mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,
        use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,
        aspect_ratios=aspect_ratios, normalizations=normalizations,
        num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,
        prior_variance=prior_variance, kernel_size=3, pad=1)

# Create the MultiBoxLossLayer.
name = "mbox_loss"
mbox_layers.append(net.label)
net[name] = L.MultiBoxLoss(*mbox_layers, multibox_loss_param=multibox_loss_param,
        loss_param=loss_param, include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),
        propagate_down=[True, True, False, False])

with open(train_net_file, 'w') as f:
    print('name: "{}_train"'.format(model_name), file=f)
    print(net.to_proto(), file=f)
shutil.copy(train_net_file, job_dir)

# Create test net.
net = caffe.NetSpec()
net.data, net.label = CreateAnnotatedDataLayer(test_data, batch_size=test_batch_size,
        train=False, output_label=True, label_map_file=label_map_file,
        transform_param=test_transform_param)

VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,
    dropout=False, freeze_layers=freeze_layers)

AddExtraLayers(net, use_batchnorm)

mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,
        use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,
        aspect_ratios=aspect_ratios, normalizations=normalizations,
        num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,
        prior_variance=prior_variance, kernel_size=3, pad=1)

conf_name = "mbox_conf"
if multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.SOFTMAX:
  reshape_name = "{}_reshape".format(conf_name)
  net[reshape_name] = L.Reshape(net[conf_name], shape=dict(dim=[0, -1, num_classes]))
  softmax_name = "{}_softmax".format(conf_name)
  net[softmax_name] = L.Softmax(net[reshape_name], axis=2)
  flatten_name = "{}_flatten".format(conf_name)
  net[flatten_name] = L.Flatten(net[softmax_name], axis=1)
  mbox_layers[1] = net[flatten_name]
elif multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.LOGISTIC:
  sigmoid_name = "{}_sigmoid".format(conf_name)
  net[sigmoid_name] = L.Sigmoid(net[conf_name])
  mbox_layers[1] = net[sigmoid_name]

net.detection_out = L.DetectionOutput(*mbox_layers,
    detection_output_param=det_out_param,
    include=dict(phase=caffe_pb2.Phase.Value('TEST')))
net.detection_eval = L.DetectionEvaluate(net.detection_out, net.label,
    detection_evaluate_param=det_eval_param,
    include=dict(phase=caffe_pb2.Phase.Value('TEST')))

with open(test_net_file, 'w') as f:
    print('name: "{}_test"'.format(model_name), file=f)
    print(net.to_proto(), file=f)
shutil.copy(test_net_file, job_dir)

# Create deploy net.
# Remove the first and last layer from test net.
deploy_net = net
with open(deploy_net_file, 'w') as f:
    net_param = deploy_net.to_proto()
    # Remove the first (AnnotatedData) and last (DetectionEvaluate) layer from test net.
    del net_param.layer[0]
    del net_param.layer[-1]
    net_param.name = '{}_deploy'.format(model_name)
    net_param.input.extend(['data'])
    net_param.input_shape.extend([
        caffe_pb2.BlobShape(dim=[1, 3, resize_height, resize_width])])
    print(net_param, file=f)
shutil.copy(deploy_net_file, job_dir)

# Create solver.
solver = caffe_pb2.SolverParameter(
        train_net=train_net_file,
        test_net=[test_net_file],
        snapshot_prefix=snapshot_prefix,
        **solver_param)

with open(solver_file, 'w') as f:
    print(solver, file=f)
shutil.copy(solver_file, job_dir)

max_iter = 0
# Find most recent snapshot.
for file in os.listdir(snapshot_dir):
  if file.endswith(".solverstate"):
    basename = os.path.splitext(file)[0]
    iter = int(basename.split("{}_iter_".format(model_name))[1])
    if iter > max_iter:
      max_iter = iter

train_src_param = '--weights="{}" \\\n'.format(pretrain_model)
if resume_training:
  if max_iter > 0:
    train_src_param = '--snapshot="{}_iter_{}.solverstate" \\\n'.format(snapshot_prefix, max_iter)

if remove_old_models:
  # Remove any snapshots smaller than max_iter.
  for file in os.listdir(snapshot_dir):
    if file.endswith(".solverstate"):
      basename = os.path.splitext(file)[0]
      iter = int(basename.split("{}_iter_".format(model_name))[1])
      if max_iter > iter:
        os.remove("{}/{}".format(snapshot_dir, file))
    if file.endswith(".caffemodel"):
      basename = os.path.splitext(file)[0]
      iter = int(basename.split("{}_iter_".format(model_name))[1])
      if max_iter > iter:
        os.remove("{}/{}".format(snapshot_dir, file))

# Create job file.
with open(job_file, 'w') as f:
  f.write('cd {}\n'.format(caffe_root))
  f.write('./build/tools/caffe train \\\n')
  f.write('--solver="{}" \\\n'.format(solver_file))
  f.write(train_src_param)
  if solver_param['solver_mode'] == P.Solver.GPU:
    f.write('--gpu {} 2>&1 | tee {}/{}.log\n'.format(gpus, job_dir, model_name))
  else:
    f.write('2>&1 | tee {}/{}.log\n'.format(job_dir, model_name))

# Copy the python script to job_dir.
py_file = os.path.abspath(__file__)
shutil.copy(py_file, job_dir)

# Run the job.
os.chmod(job_file, stat.S_IRWXU)
if run_soon:
  subprocess.call(job_file, shell=True)

训练命令：
python examples/ssd/ssd_pascal_indoor.py

4 测试

SSD框架中提供了测试代码，有C++版本和python版本

`4.1` `c++版本`

编译完SSD后，C++版本的的可执行文件存放目录： .build_release/examples/ssd/ssd_detect.bin

测试命令  ./.build_release/examples/ssd/ssd_detect.bin models/VGGNet/indoor/deploy.prototxt   models/VGGNet/indoor/VGG_VOC0712_SSD_300x300_iter_60000.caffemodel pictures.txt

其中pictures.txt中保存的是待测试图片的list

`4.2` `python版本`

python 版本的测试过程参见examples/detection.ipynb

参考：
　1 Faster RCNN 训练自己的数据集(Matlab,python版本)及制作VOC2007格式数据集
  2 SSD的配置及运行

https://blog.csdn.net/u014696921/article/details/53353896

接下来将SSD应用在检测飞机的上面：

见： /home/echo/vision/caffe/examples/airplaneDete

Create the trainval.txt, test.txt, and test_name_size.txt in /home/echo/vision/caffe/examples/airplaneDete/，这边我用MATLAB编写（相当于之前的creat_list.py），生成前面是图片路径，后面是xml文件路径的形式：

见main_jpgxml.m文件

运行

~/vision/caffe$ /home/echo/vision/caffe/examples/airplaneDete/create_data.sh

再创建：test_name_size.txt，代码请看： /home/echo/vision/caffe/examples/airplaneDete/create_list.sh：

训练：我本想基于作者训练好的检测网络去训练自己的库，但是不知咋地，有点问题。

Cannot copy param 0 weights from layer 'conv4_3_norm_mbox_conf'; shape mismatch. Source param shape is 84 512 3 3 (387072); target param shape is 8 512 3 3 (36864). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

解释：我估计微调并不能基于别人检测的全部网络，只能是前面分类的部分，因为由于检测类别数目等不的一样，检测网络可能不一样，YOLOv2的微调同样如此。

blog.csdn.net/u010167269/article/details/52851667这篇文章是利用VGG网络，自己训练的，看了几个都是这样自己训的，目前我也还是用的300*300的训练的。

修改ssd_pascal.py，运行：

python /home/echo/vision/caffe/examples/airplaneDete/ssd_pascal.py

修改的地方：

(1) 图片路径：

（2）初始的学习率（注意后面有公式把他改了，0.0004与0.00004对应的是0.001，我这样改了之后变成0.0005）;

（3）这几个存储的路径save_dir存储生成的.prototxt文件，snapshot_dir存储模型中间snap状态：

job_dir中存储的整个项目过程需要的东西：

注意：job_dir与save_dir不要在一个路径下，不然生成的文件会冲突

（4）模型和mapfile

（5）batch_size:

(6) test:

(7)solver_param参数：stepvalue，max_iter，snapshot，display，test_interval之类，还可以改一些测试的时候的参数：confidence_threshold以及overlap_threshold

训练...等待...50000次，我训练了接近一天呀！！训练结果：

测试的精度是：detection_eval = 0.892142，比YOLOv2要好（我想可能是用了好几个多尺度的featureMap），但是！网络对于过小的目标，还是不行。起码YOLOv2还可以扩大输入图像的尺寸，结果会再好一点点。虽然也不知道最后结果能不能比SSD好。起码，现在没有它好。

SSD测试单张:

SSD 的作者也给我们写好了 predict 的代码，我们只需要该参数就可以了。用 jupyter notebook 打开 ~/caffe/examples/ssd_detect.ipynb文件,改一下各种路径以及文件就好。

注意：这种109*131指的是宽109，高131

大目标大部分都可以以100%的检测到，但是小目标几乎就检测不到了，这与之前wrap到300有直接的关系。而且我尝试直接把图像resize到3000比如，会报错，因为我的deploy.prototxt中写的是300,改deploy.prototxt成3000更会报错，一个模型训好之后，输入大小几乎就不能再变了。另外一个是500的，又是另外一个模型了。这个输入到底会在哪边有影响呢？

原因：我这边犯蠢了呀！输入各种大小，都会被resize到网络需要的大小，为啥网络一定要固定大小，是因为比如规定某一层的feature map用于回归框以及置信度，假设大小是m*m×p，则会用用卷积还是啥生成m*m*(框的位置＋置信度数目)，所以feature map的大小决定后面用多少个卷积啥的。好像这个解释不太对，对比VGG16与deploy.prototxt看一下：

VGG16前面所有的卷基层，池化之类的都一抹一眼的写在了deploy.prototxt里面，并且F6,F7改成卷基层，丢掉VGG16的dropout，后面加了conv4_3_norm层的对conv4_3层的输出进行Normalize，conv4_3_norm_mbox_loc这种就是对box的定位，没有置信度预测,后面单独预测置信度，巴拉巴拉...反正就是训好的模型对输入大小肯定有要求。

YOLOv2模型输入一开始也都是resize到416的，分类与检测是一样的大小，但是因为网络最后的预测只用到了卷积与池化操作，与输入大小无关，所以可以输入各种大小的图片预测。只是训练的时候，会改变网络输入大小，使得可以训练各个尺度的图片，288至600的输入都可以用一个权重相同的网络。这边如何与输入大小无关，我并不知道。试一下：果真，我把yolo-voc.cfg下面的height与width：

height=1000 #800 #600 #416 #288 #416
width=1000 #800 #600 # #288 #416

网络都可以运行，而且一般来说，对于700*800的原始图片，设置越大的尺寸，检测越好。

只是我还不明白为啥可以多尺度训练，以及为啥改成任何尺度都可以，也就是跟输入无关，看了yolo-voc.cfg最后用的是1*1的卷积去预测的，与图像大小无关。也就是说最后并非是划分成什么网格，而是就是按照feature map的大小，v1是划分成7*7的网格的。

SSD批量测试:这算是遗留的问题吧！

假设用SSD_500,怎样训练，我想是还是改上面ssd_pascal.py文件，直接改里面的resize_width = 300 和resize_height = 300，变成500吧，下面估计也要改成500（不确定）：

# minimum dimension of input image
min_dim = 300

补充（看代码）：

见：github.com/EchoIR/airplaneDetec/tree/SSD
1. create_list.sh:没有啥特别的，就是生成trainval.txt，test.txt以及test_name_size.txt
2. create_data.sh:就是去生成lmdb文件以及它的软连接。其中的min-dim，max-dim,resize-height,resize-width都设置成０，应该是说不对图像进行任何尺寸的伸缩或者尺寸有任何大小要求。
3. ssd_pascal.py:只看最基本的设置，主要不懂的地方是batch_sampler是what?我想是对正样本的选取？反正应该是样本如何选取的，也算是数据增强的吧

你可能感兴趣的:(计算机视觉,图像处理,计算机视觉,深度学习,算法学习)

基于立创·天空星开发板-GD32F407VET6-青春版，开发一款手持热成像仪。该设备将采集热红外传感器的数据，经过处理后在LCD屏幕上显示热图像，并提供用户交互界面。嵌入式程序员小刘物联网单片机嵌入式硬件开源
本项目基于立创·天空星开发板-GD32F407VET6-青春版，开发一款手持热成像仪。该设备将采集热红外传感器的数据，经过处理后在LCD屏幕上显示热图像，并提供用户交互界面。关注微信公众号，提前获取相关推文一、需求分析核心功能:热图像采集:读取热红外传感器数据。图像处理:将原始传感器数据转换为可显示的彩色或灰度热图像。图像显示:在LCD屏幕上实时显示热图像。温度测量:计算并显示图像中特定点的温度值
Python从0到100（三十九）：数据提取之正则（文末免费送书）是Dream呀 python mysql 开发语言
前言：零基础学Python：Python从0到100最新最全教程。想做这件事情很久了，这次我更新了自己所写过的所有博客，汇集成了Python从0到100，共一百节课，帮助大家一个月时间里从零基础到学习Python基础语法、Python爬虫、Web开发、计算机视觉、机器学习、神经网络以及人工智能相关知识，成为学习学习和学业的先行者！欢迎大家订阅专栏：零基础学Python：Python从0到100最新
深度学习环境配置——Anaconda安装 tyyhmtyyhm 深度学习环境配置深度学习人工智能
目录Ⅰ.Windows系统安装Anaconda1.1下载安装Ⅱ.Linux系统安装Anaconda（适用于服务器安装）2.1下载2.2安装操作系统：windows11/ubuntu20/ubuntu18更新时间：20240221Ⅰ.Windows系统安装Anaconda1.1下载安装https://www.anaconda.com/download默认安装即可。Ⅱ.Linux系统安装Anacond
深度学习工厂的蓝图：拆解CUDA驱动、PyTorch与OpenCV的依赖关系时光旅人01号深度学习 pytorch opencv
想象一下，你正在建造一座深度学习工厂，这座工厂专门用于高效处理深度学习任务（如训练神经网络）和计算机视觉任务（如图像处理）。为了让工厂顺利运转，你需要搭建基础设施、安装设备、设置生产线，并配备控制台来管理整个生产过程。以下是这座工厂的详细构建过程：1.工厂的基础设施：Ubuntu比喻：Ubuntu是工厂所在的土地和建筑，提供了基础设施和运行环境。作用：提供操作系统环境，支持安装和运行各种工具和框架
FakeApp 技术浅析（一）爱研究的小牛 AIGC—深度伪造虚拟现实人工智能 AIGC 深度学习机器学习
FakeApp是一款早期的深度伪造（Deepfake）工具，最初于2018年发布，用于生成和编辑换脸视频。尽管FakeApp已经不再更新，但它在深度伪造技术的发展中起到了重要作用。1.技术背景与理论基础1.1生成对抗网络（GANs）生成对抗网络（GANs）是深度学习领域中的一种重要模型，由生成器（Generator）和判别器（Discriminator）组成。生成器负责生成逼真的数据（如图像、视频
DeepSeek 赋能工业软件之全流程方案爱吃青菜的大力水手人工智能自动化持续部署语言模型开源
deepseek赋能工业软件之全流程方案之侧重半导体FABdeepseek在工业软件中的应用场景“deepseek”大模型在工业软件领域拥有广泛的应用场景，包括以下几个方面：智能调度：利用深度学习和优化算法，根据实时数据动态调整生产计划和资源分配。它可以综合考虑订单需求、设备状态和产能限制，智能生成最优的生产排程方案，减少等待时间和切换成本。例如在汽车制造工厂，deepseek可根据订单需求和设备
在瑞芯微RK3588平台上使用RKNN部署YOLOv8Pose模型的C++实战指南机＿长 YOLO系列模型有效涨点改进深度学习落地实战 YOLO c++开发语言
在人工智能和计算机视觉领域，人体姿态估计是一项极具挑战性的任务，它对于理解人类行为、增强人机交互等方面具有重要意义。YOLOv8Pose作为YOLO系列中的新成员，以其高效和准确性在人体姿态估计任务中脱颖而出。本文将详细介绍如何在瑞芯微RK3588平台上，使用RKNN（RockchipNeuralNetworkToolkit）框架部署YOLOv8Pose模型，并进行C++代码的编译和运行。注本文全
深度学习之目标检测的常用标注工具铭瑾熙人工智能机器学习深度学习深度学习目标检测目标跟踪
1LabelImgLabelImg是一款开源的图像标注工具，标签可用于分类和目标检测，它是用Python编写的，并使用Qt作为其图形界面，简单好用。注释以PASCALVOC格式保存为XML文件，这是ImageNet使用的格式。此外，它还支持COCO数据集格式。2labelmelabelme是一款开源的图像/视频标注工具，标签可用于目标检测、分割和分类。灵感是来自于MIT开源的一款标注工具Label
34、深度学习-自学之路-深入理解-NLP自然语言处理-RNN一个简单的程序，可以从程序中理解RNN的基本思想。小宇爱深度学习-自学之路深度学习自然语言处理 rnn
importsys,random,mathfromcollectionsimportCounterimportnumpyasnpf=open('tasks_1-20_v1/en/qa1_single-supporting-fact_train.txt','r')raw=f.readlines()f.close()tokens=list()forlineinraw[0:1000]:tokens.ap
DeepSeek-R1 技术全景解析：从原理到实践的“炼金术配方” ——附多阶段训练流程图与核心误区澄清... 雪停时偶遇一叶春流程图
合集-人工智能(5)1.如何改进AI模型在特定环境中的知识检索2024-09-242.深度学习与统计学中的时间序列预测2024-10-033.《使用coze搭建一个会搜索、写ppt、思维导图的Agent》2024-10-294.深入浅出：Agent如何调用工具——从OpenAIFunctionCall到CrewAI框架01-145.DeepSeek-R1技术全景解析：从原理到实践的“炼金术配方”—
使用Python和OpenCV实现图像像素压缩与解压东方佑量子变法 python opencv 开发语言
在本文中，我们将探讨如何使用Python和OpenCV库来实现一种简单的图像像素压缩算法。我们将详细讨论代码的工作原理，并提供一个具体的示例来演示该过程。1.引言随着数字媒体的普及，图像处理成为了一个重要的领域。无论是为了减少存储空间还是加快网络传输速度，图像压缩技术都扮演着至关重要的角色。这里，我们提出了一种基于像素重复模式的简单压缩算法，它适用于具有大量连续相同像素值的图像。2.技术栈介绍2.
《神经网络与深度学习》(邱锡鹏) 内容概要【不含数学推导】 code_stream #机器学习神经网络
第1章绪论基本概念：介绍了人工智能的发展历程及不同阶段的特点，如符号主义、连接主义、行为主义等。还阐述了深度学习在人工智能领域的重要地位和发展现状，以及其在图像、语音、自然语言处理等多个领域的成功应用。术语解释人工智能：旨在让机器模拟人类智能的技术和科学。深度学习：一种基于对数据进行表征学习的方法，通过构建具有很多层的神经网络模型，自动从大量数据中学习复杂的模式和特征。第2章机器学习概述基本概念：
图像识别与应用狂踹瘸子那条好脚 python
图像识别作为人工智能领域的重要分支，近年来取得了显著进展，其中卷积神经网络（CNN）功不可没。CNN凭借其强大的特征提取能力，在图像分类、目标检测、人脸识别等任务中表现出色，成为图像识别领域的核心技术。一、卷积神经网络：图像识别的利器CNN是一种专门处理网格状数据的深度学习模型，其结构设计灵感来源于生物视觉系统。与全连接神经网络不同，CNN通过卷积层、池化层等结构，能够有效提取图像的局部特征，并逐
大模型如何改变教育？典型应用场景的探究与展望！ AGI大模型学习大模型应用人工智能 AI产品经理 llama 大模型 AI 大模型教程
目前，大模型在教育领域的应用主要体现在个性化学习助手、智能问答系统、内容生成与创作辅助、智能写作评估、跨语言学习支持、数学解题辅助等几个方面。大模型技术在教育领域凭借卓越的数据处理能力和深度学习技术，极大推动了教育质量的提升与教育公平的实现。分级分类的教育数据助力大模型发展在构建与优化大模型的过程中，教育数据能够帮助我们更精准地理解教育现象，更有质量地辅助教学。教育数据涵盖广泛，包括但不限于学生的
代理IP助力AI图像处理，开启行业新篇章傻啦嘿哟关于代理IP那些事儿人工智能 tcp/ip 图像处理
目录一、代理IP技术简介二、代理IP在AI图像处理中的应用1.提升数据访问速度2.增强数据处理能力3.突破网络限制三、代理IP在AI图像处理中的实际案例案例一：AI图像生成软件案例二：AI动画创作四、代理IP技术的未来展望五、结语在科技日新月异的今天，AI图像处理技术以其广泛的应用前景和强大的处理能力，正深刻改变着我们的世界。从人脸识别、自动驾驶到医学影像分析，AI图像处理技术无处不在，发挥着不可
DeepSeek原理介绍以及对网络安全行业的影响 AI拉呱 Deepseek 人工智能
大家好，我是AI拉呱，一个专注于人工智领域与网络安全方面的博主，现任资深算法研究员一职，兼职硕士研究生导师；热爱机器学习和深度学习算法应用，深耕大语言模型微调、量化、私域部署。曾获多次获得AI竞赛大奖，拥有多项发明专利和学术论文。对于AI算法有自己独特见解和经验。曾辅导十几位非计算机学生转行到算法岗位就业。关注评审分享一起学习更多知识。1.DeepSeek公司介绍1.1DeepSeek是什么：wh
auto-gptq安装以及不适配软硬件环境可能出现的问题及解决方式 IT修炼家大模型部署大模型 auto-gptq cuda
目录1、auto-gptq是什么？2、auto-gptq安装3、auto-gptq不正确安装可能会出现的问题（1）爆出：`CUDAextensionnotinstalled.`（2）没有报错但是推理速度超级慢1、auto-gptq是什么？Auto-GPTQ是一种专注于量化深度学习模型的工具库。它的主要目标是通过量化技术（Quantization）将大型语言模型（LLM）等深度学习模型的大小和计算复
【deepseek与chatGPT辩论】辩论题： “人工智能是否应当具备自主决策能力？” 海宁不掉头发软件工程人工智能人工智能 chatgpt deepseek
探讨辩论题这个提案涉及创建一个精确的辩论题目，旨在测试deepseek的应答能力。创建辩论题目提议设计一个辩论题目以测试deepseek的应答能力。希望这个题目具有挑战性并能够测量其回应质量。好的，来一道适合深度学习的辩论题：辩论题：“人工智能是否应当具备自主决策能力？”这个话题涉及到人工智能的发展、伦理以及未来应用，可以从以下几个方面展开辩论：支持方：认为人工智能的自主决策能力能够加速科技进步，
基于python深度学习遥感影像地物分类与目标识别、分割实践技术应用 xiao5kou4chang6kai4 深度学习遥感勘测 python 深度学习分类
专题一：深度学习发展与机器学习深度学习的历史发展过程机器学习，深度学习等任务的基本处理流程梯度下降算法讲解不同初始化，学习率对梯度下降算法的实例分析从机器学习到深度学习算法专题二深度卷积网络、卷积神经网络、卷积运算的基本原理池化操作，全连接层，以及分类器的作用BP反向传播算法的理解一个简单CNN模型代码理解特征图，卷积核可视化分析专题三TensorFlow与keras介绍与入门TensorFlow
LLM与知识图谱融合:智能运维知识库构建 AI天才研究院 DeepSeek R1 &大数据AI人工智能大模型 AI大模型企业级应用开发实战 AI实战计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍随着信息技术的飞速发展，IT运维管理面临着越来越大的挑战。海量的设备、复杂的网络环境、日益增长的数据量，使得传统的运维方式难以满足需求。为了提高运维效率和质量，智能运维应运而生。智能运维的核心是将人工智能技术应用于运维领域，通过机器学习、深度学习等算法，实现自动化、智能化的运维管理。其中，大语言模型（LLM）和知识图谱是两个重要的技术方向。LLM能够理解和生成自然语言，可以用于构建智能
深度应用场景：DeepSeek —— 探索AI赋能的智慧未来人工智能专属驿站人工智能
深度应用场景：DeepSeek——探索AI赋能的智慧未来随着人工智能的迅猛发展，数据的价值已不再局限于简单的存储与处理，它们正变得更加智能与高效。DeepSeek，这一创新的AI技术平台，正以其独特的深度学习能力，开启了各行各业的智能化变革。让我们走进一个由DeepSeek打造的深度应用场景，探索它如何推动未来的发展。1.智能医疗：精准诊断，拯救生命想象一下，医生们不再是唯一的诊断专家，而是与AI
使用BLIP模型生成图像描述的可查询索引 dgay_hua python 计算机视觉开发语言
在本篇文章中，我们将介绍如何使用预训练的SalesforceBLIP图像描述模型，生成一个可查询的图像描述索引。我们将使用ImageCaptionLoader来加载图像，并通过一系列步骤生成查询索引。使用示例代码进行演示，帮助读者理解和实践。技术背景介绍随着计算机视觉技术的发展，图像描述生成成为了重要的研究领域。通过对图像内容自动生成文字描述，可以大大提高对图像信息的检索和管理效率。Salesfo
卷积神经网络之AlexNet经典神经网络，实现手写数字0~9识别知识鱼丸深度学习神经网络 cnn 人工智能深度学习 AlexNet 经典神经网络
深度学习中较为常见的神经网络模型AlexNet，AlexNet是一个采用GPU训练的深层CNN，本质是种LeNet变体。由特征提取层的5个卷积层两个下采样层和分类器中的三个全连接层构成。先看原理：AlexNet网络特点采用ReLU激活函数，使训练速度提升6倍采用dropout层，防止模型过拟合通过平移和翻转的方式对数据进行增强采用LRN局部响应归一化，限制数据大小，防止梯度消失和爆炸。但后续证明批
用 TensorFlow 搭建简单的手写数字识别模型 lozhyf 工作面试学习 tensorflow 人工智能 python
一、引言手写数字识别是机器学习领域中一个经典且基础的问题，它在很多实际场景中都有广泛的应用，比如邮政系统中的邮件分拣、银行支票金额识别等。TensorFlow是一个强大的开源机器学习框架，由Google开发并维护，它提供了丰富的工具和接口，能帮助我们快速搭建和训练深度学习模型。在这篇博客中，我们将使用TensorFlow构建一个简单的神经网络模型，用于识别手写数字。二、环境准备在开始之前，你需要安
8-项目实战-信用卡数字识别 #北极星star Opencv图像处理框架实战 opencv 计算机视觉人工智能
目录(1)总体流程与方法(2)代码实现(3)识别结果(1)总体流程与方法①读取模板图像：加载包含数字模板的图像，并提取每个数字的轮廓，将它们作为模板存储。②读取输入图像：加载待识别的信用卡图像，并进行预处理。③提取数字区域：通过一系列图像处理操作（如礼帽操作、梯度计算、闭操作等）提取可能包含数字的区域。④轮廓排序与筛选：找到提取区域的轮廓，并根据轮廓的宽高比和尺寸筛选出符合条件的数字区域。⑤数字识
深度学习在医疗影像分析中的革命性应用 Echo_Wish 人工智能前沿技术深度学习人工智能
深度学习在医疗影像分析中的革命性应用引言医疗影像分析是现代医学中不可或缺的一部分，特别是在疾病诊断和治疗过程中发挥了至关重要的作用。随着深度学习技术的发展，医疗影像分析的效率和准确性得到了显著提升。本文将探讨如何利用深度学习技术，特别是Python编程语言，来优化医疗影像分析，展示具体的代码实例，并举例说明其实际应用效果。深度学习与医疗影像分析深度学习（DeepLearning）是一种基于人工神经
yolov8人脸识别与脸部关键点检测（代码+原理） QQ_1309399183 计算机视觉实战项目集锦 YOLO 人工智能人脸识别 yolo人脸检测
YOLOv8脸部识别是一个基于YOLOv8算法的人脸检测项目，旨在实现快速、准确地检测图像和视频中的人脸。该项目是对YOLOv8算法的扩展和优化，专门用于人脸检测任务。YOLOv8是一种基于深度学习的目标检测算法，通过将目标检测问题转化为一个回归问题，可以实现实时的目标检测。YOLOv8Face项目在YOLOv8的基础上进行了改进，使其更加适用于人脸检测。以下是YOLOv8Face项目的一些特点和
探索Vearch：高效的深度学习向量相似度搜索系统 scaFHIO 深度学习人工智能 python
Vearch是一个可扩展的分布式系统，用于高效搜索深度学习向量的相似度。在本文中，我们将介绍Vearch的技术背景及其核心原理，演示如何使用VearchPythonSDK进行安装和设置，并分析一些实际应用场景，最后提供一些实战建议。技术背景介绍随着深度学习技术的发展，向量相似度搜索在各类应用中变得越来越重要。从图像识别、推荐系统到自然语言处理，向量搜索可以极大地提升系统的性能。然而，随着数据量的增
华为的云端训练算力与迭代效率 AI大模型应用之禅 DeepSeek R1 &AI大模型与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
华为云、云端训练、算力、迭代效率、人工智能、深度学习、模型训练、分布式训练、优化算法1.背景介绍人工智能（AI）技术近年来发展迅速，深度学习作为其核心驱动力，在图像识别、自然语言处理、语音识别等领域取得了突破性进展。然而，深度学习模型的训练需要海量数据和强大的计算资源，这成为AI技术发展面临的瓶颈之一。云计算作为一种新型的计算模式，为深度学习提供了强大的算力支持。华为云作为国内领先的云计算平台，在
AI外呼机器人：营销新利器还是骚扰电话的升级版？ yoloGina 客户管理外呼系统电话外呼人工智能机器人
"您好，这里是XX房产，最近有购房需求吗？""您好，您最近有种牙需求吗？"相信很多人都接到过类似的营销电话，而电话那头，很可能已经不是真人，而是AI外呼机器人。近年来，AI外呼系统凭借其高效率、低成本的优势，迅速在电销行业普及，成为企业营销的"新宠"。据统计，2022年中国AI外呼市场规模已达50亿元，预计2025年将突破100亿元。AI外呼系统的核心技术是语音识别和自然语言处理。通过深度学习海量
关于旗正规则引擎规则中的上传和下载问题何必如此文件下载压缩 jsp 文件上传
文件的上传下载都是数据流的输入输出，大致流程都是一样的。一、文件打包下载 1.文件写入压缩包 string mainPath="D:\upload\"; 下载路径 string tmpfileName=jar.zip; &n
【Spark九十九】Spark Streaming的batch interval时间内的数据流转源码分析 bit1129 Stream
以如下代码为例（SocketInputDStream）： Spark Streaming从Socket读取数据的代码是在SocketReceiver的receive方法中，撇开异常情况不谈(Receiver有重连机制，restart方法，默认情况下在Receiver挂了之后，间隔两秒钟重新建立Socket连接)，读取到的数据通过调用store(textRead)方法进行存储。数据
spark master web ui 端口8080被占用解决方法 daizj 8080 端口占用 spark master web ui
spark master web ui 默认端口为8080，当系统有其它程序也在使用该接口时，启动master时也不会报错，spark自己会改用其它端口，自动端口号加1，但为了可以控制到指定的端口，我们可以自行设置，修改方法： 1、cd SPARK_HOME/sbin 2、vi start-master.sh 3、定位到下面部分
oracle_执行计划_谓词信息和数据获取周凡杨 oracle 执行计划
oracle_执行计划_谓词信息和数据获取(上) 一：简要说明在查看执行计划的信息中，经常会看到两个谓词filter和access，它们的区别是什么，理解了这两个词对我们解读Oracle的执行计划信息会有所帮助。简单说，执行计划如果显示是access，就表示这个谓词条件的值将会影响数据的访问路径（表还是索引），而filter表示谓词条件的值并不会影响数据访问路径，只起到
spring中datasource配置 g21121 dataSource
datasource配置有很多种，我介绍的一种是采用c3p0的，它的百科地址是： http://baike.baidu.com/view/920062.htm  <bean name="propertiesConfig" class="org.springframework.b
web报表工具FineReport使用中遇到的常见报错及解决办法（三）老A不折腾 finereport FAQ 报表软件
这里写点抛砖引玉，希望大家能把自己整理的问题及解决方法晾出来，Mark一下，利人利己。出现问题先搜一下文档上有没有，再看看度娘有没有，再看看论坛有没有。有报错要看日志。下面简单罗列下常见的问题，大多文档上都有提到的。 1、repeated column width is largerthan paper width：这个看这段话应该是很好理解的。比如做的模板页面宽度只能放
mysql 用户管理墙头上一根草 linux mysql user
1.新建用户 //登录MYSQL@>mysql -u root -p@>密码//创建用户mysql> insert into mysql.user(Host,User,Password) values(‘localhost’,'jeecn’,password(‘jeecn’));//刷新系统权限表mysql>flush privileges;这样就创建了一个名为：
关于使用Spring导致c3p0数据库死锁问题 aijuans spring Spring 入门 Spring 实例 Spring3 Spring 教程
这个问题我实在是为整个 springsource 的员工蒙羞如果大家使用 spring 控制事务，使用 Open Session In View 模式， com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.
百度词库联想 annan211 百度
<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>RunJS</title&g
int数据与byte之间的相互转换实现代码百合不是茶位移 int转byte byte转int 基本数据类型的实现
在BMP文件和文件压缩时需要用到的int与byte转换,现将理解的贴出来; 主要是要理解;位移等概念 http://baihe747.iteye.com/blog/2078029 int转byte; byte转int; /** * 字节转成int,int转成字节 * @author Administrator *
简单模拟实现数据库连接池 bijian1013 java thread java多线程简单模拟实现数据库连接池
简单模拟实现数据库连接池实例1： package com.bijian.thread; public class DB { //private static final int MAX_COUNT = 10; private static final DB instance = new DB(); private int count = 0; private i
一种基于Weblogic容器的鉴权设计 bijian1013 java weblogic
服务器对请求的鉴权可以在请求头中加Authorization之类的key，将用户名、密码保存到此key对应的value中，当然对于用户名、密码这种高机密的信息，应该对其进行加砂加密等，最简单的方法如下： String vuser_id = "weblogic"; String vuse
【RPC框架Hessian二】Hessian 对象序列化和反序列化 bit1129 hessian
任何一个对象从一个JVM传输到另一个JVM，都要经过序列化为二进制数据(或者字符串等其他格式，比如JSON)，然后在反序列化为Java对象，这最后都是通过二进制的数据在不同的JVM之间传输(一般是通过Socket和二进制的数据传输)，本文定义一个比较符合工作中。 1. 定义三个POJO Person类 package com.tom.hes
【Hadoop十四】Hadoop提供的脚本的功能 bit1129 hadoop
1. hadoop-daemon.sh 1.1 启动HDFS ./hadoop-daemon.sh start namenode ./hadoop-daemon.sh start datanode 通过这种逐步启动的方式，比start-all.sh方式少了一个SecondaryNameNode进程，这不影响Hadoop的使用，其实在 Hadoop2.0中，SecondaryNa
中国互联网走在“灰度”上 ronin47 管理灰度
中国互联网走在“灰度”上（转）文/孕峰第一次听说灰度这个词，是任正非说新型管理者所需要的素质。第二次听说是来自马化腾。似乎其他人包括马云也用不同的语言说过类似的意思。灰度这个词所包含的意义和视野是广远的。要理解这个词，可能同样要用“灰度”的心态。灰度的反面，是规规矩矩，清清楚楚，泾渭分明，严谨条理，是决不妥协，不转弯，认死理。黑白分明不是灰度，像彩虹那样
java-51-输入一个矩阵，按照从外向里以顺时针的顺序依次打印出每一个数字。 bylijinnan java
public class PrintMatrixClockwisely { /** * Q51.输入一个矩阵，按照从外向里以顺时针的顺序依次打印出每一个数字。例如：如果输入如下矩阵： 1 2 3 4 5 6 7 8 9
mongoDB 用户管理开窍的石头 mongoDB用户管理
1:添加用户第一次设置用户需要进入admin数据库下设置超级用户（use admin） db.addUsr({user:'useName',pwd:'111111',roles:[readWrite,dbAdmin]}); 第一个参数用户的名字第二个参数
[游戏与生活]玩暗黑破坏神3的一些问题 comsci 生活
暗黑破坏神3是有史以来最让人激动的游戏。。。。但是有几个问题需要我们注意玩这个游戏的时间，每天不要超过一个小时，且每次玩游戏最好在白天结束游戏之后，最好在太阳下面来晒一下身上的暗黑气息，让自己恢复人的生气 &nb
java 二维数组如何存入数据库 cuiyadll java
using System; using System.Linq; using System.Text; using System.Windows.Forms; using System.Xml; using System.Xml.Serialization; using System.IO; namespace WindowsFormsApplication1 {
本地事务和全局事务Local Transaction and Global Transaction(JTA) darrenzhu java spring local global transaction
Configuring Spring and JTA without full Java EE http://spring.io/blog/2011/08/15/configuring-spring-and-jta-without-full-java-ee/ Spring doc -Transaction Management http://docs.spring.io/spri
Linux命令之alias - 设置命令的别名，让 Linux 命令更简练 dcj3sjt126com linux alias
用途说明设置命令的别名。在linux系统中如果命令太长又不符合用户的习惯，那么我们可以为它指定一个别名。虽然可以为命令建立“链接”解决长文件名的问题，但对于带命令行参数的命令，链接就无能为力了。而指定别名则可以解决此类所有问题【1】。常用别名来简化ssh登录【见示例三】，使长命令变短，使常用的长命令行变短，强制执行命令时询问等。常用参数格式：alias 格式：ali
yii2 restful web服务[格式响应] dcj3sjt126com PHP yii2
响应格式当处理一个 RESTful API 请求时，一个应用程序通常需要如下步骤来处理响应格式：确定可能影响响应格式的各种因素，例如媒介类型，语言，版本，等等。这个过程也被称为 content negotiation。资源对象转换为数组，如在 Resources 部分中所描述的。通过 [[yii\rest\Serializer]]
MongoDB索引调优（2）——[十] eksliang mongodb MongoDB索引优化
转载请出自出处：http://eksliang.iteye.com/blog/2178555 一、概述上一篇文档中也说明了，MongoDB的索引几乎与关系型数据库的索引一模一样，优化关系型数据库的技巧通用适合MongoDB，所有这里只讲MongoDB需要注意的地方二、索引内嵌文档可以在嵌套文档的键上建立索引，方式与正常
当滑动到顶部和底部时，实现Item的分离效果的ListView gundumw100 android
拉动ListView，Item之间的间距会变大，释放后恢复原样； package cn.tangdada.tangbang.widget; import android.annotation.TargetApi; import android.content.Context; import android.content.res.TypedArray; import andr
程序员用HTML5制作的爱心树表白动画 ini JavaScript jquery Web html5 css
体验效果：http://keleyi.com/keleyi/phtml/html5/31.htmHTML代码如下： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"><head><meta charset="UTF-8" > <ti
预装windows 8 系统GPT模式的ThinkPad T440改装64位 windows 7旗舰版 kakajw ThinkPad 预装改装 windows 7 windows 8
该教程具有普遍参考性，特别适用于联想的机器，其他品牌机器的处理过程也大同小异。该教程是个人多次尝试和总结的结果，实用性强，推荐给需要的人！缘由小弟最近入手笔记本ThinkPad T440，但是特别不能习惯笔记本出厂预装的Windows 8系统，而且厂商自作聪明地预装了一堆没用的应用软件，消耗不少的系统资源（本本的内存为4G，系统启动完成时，物理内存占用比
Nginx学习笔记 mcj8089 nginx
一、安装nginx 1、在nginx官方网站下载一个包，下载地址是： http://nginx.org/download/nginx-1.4.2.tar.gz 2、WinSCP(ftp上传工
mongodb 聚合查询每天论坛链接点击次数 qiaolevip 每天进步一点点学习永无止境 mongodb 纵观千象
/* 18 */ { "_id" : ObjectId("5596414cbe4d73a327e50274"), "msgType" : "text", "sendTime" : ISODate("2015-07-03T08:01:16.000Z"
java术语（PO/POJO/VO/BO/DAO/DTO） Luob. DAO POJO DTO po VO BO
PO(persistant object) 持久对象在o/r 映射的时候出现的概念,如果没有o/r映射,就没有这个概念存在了.通常对应数据模型(数据库),本身还有部分业务逻辑的处理.可以看成是与数据库中的表相映射的java对象.最简单的PO就是对应数据库中某个表中的一条记录,多个记录可以用PO的集合.PO中应该不包含任何对数据库的操作. VO(value object) 值对象通
算法复杂度 Wuaner Algorithm
Time Complexity & Big-O： http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o http://bigocheatsheet.com/ http://www.sitepoint.com/time-complexity-algorithms/