tensorflow2.1,CUDA10.1 的 WIN10下安装

先安装pytorch1.2的GPU版本

我的CUDA驱动是10.1的,
先安装的是pytorch 版本
activate pytorch1.2
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
——————草鸡慢————,放弃了——

用清华的镜像
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --set show_channel_urls yes

conda create -n torch1.2 python=3.6
activate torch1.2

conda install pytorch1.2.0 torchvision0.4.0 cudatoolkit=10.0

成功了!!!
——————————————————————————————————————————————————

再安装tensorflow2.1的GPU版本

创建虚拟环境 conda create -n tf2.1 python3.7
conda install tensorflow-gpu
2.1.0

装完tensorflow2.1版本正常,
在命令行输入 import tensorflow as tf
就出现了header1.10.4和library1.10.5不匹配的问题
解决办法是
pip uninstall h5py
pip install h5py

进入python命令,输入 import tensorflow as tf
tf.version
‘2.1.0’

tf.test.is_gpu_available()

最终结果
tensorflow2.1,CUDA10.1 的 WIN10下安装_第1张图片

这里写自定义目录标题

    • 先安装pytorch1.2的GPU版本
    • 再安装tensorflow2.1的GPU版本
  • 欢迎使用Markdown编辑器
    • 新的改变
    • 功能快捷键
    • 合理的创建标题,有助于目录的生成
    • 如何改变文本的样式
    • 插入链接与图片
    • 如何插入一段漂亮的代码片
    • 生成一个适合你的列表
    • 创建一个表格
      • 设定内容居中、居左、居右
  • Testing
  • Training
  • batch=64
  • subdivisions=16
  • N is batch size; D_in is input dimension;
  • H is hidden dimension; D_out is output dimension.
  • Create random Tensors to hold inputs and outputs
  • Construct our model by instantiating the class defined above
  • Construct our loss function and an Optimizer. The call to model.parameters()
  • in the SGD constructor will contain the learnable parameters of the two
  • nn.Linear modules which are members of the model.
        • 输出如下
  • Un-squeeze a dimension
      • SmartyPants
    • 创建一个自定义列表
    • 如何创建一个注脚
    • 注释也是必不可少的
    • KaTeX数学公式
    • 新的甘特图功能,丰富你的文章
    • UML 图表
    • FLowchart流程图
    • 导出与导入
      • 导出
      • 导入

欢迎使用Markdown编辑器

你好! 这是你第一次使用 Markdown编辑器 所展示的欢迎页。如果你想学习如何使用Markdown编辑器, 可以仔细阅读这篇文章,了解一下Markdown的基本语法知识。

新的改变

我们对Markdown编辑器进行了一些功能拓展与语法支持,除了标准的Markdown编辑器功能,我们增加了如下几点新功能,帮助你用它写博客:

  1. 全新的界面设计 ,将会带来全新的写作体验;
  2. 在创作中心设置你喜爱的代码高亮样式,Markdown 将代码片显示选择的高亮样式 进行展示;
  3. 增加了 图片拖拽 功能,你可以将本地的图片直接拖拽到编辑区域直接展示;
  4. 全新的 KaTeX数学公式 语法;
  5. 增加了支持甘特图的mermaid语法1 功能;
  6. 增加了 多屏幕编辑 Markdown文章功能;
  7. 增加了 焦点写作模式、预览模式、简洁写作模式、左右区域同步滚轮设置 等功能,功能按钮位于编辑区域与预览区域中间;
  8. 增加了 检查列表 功能。

功能快捷键

撤销:Ctrl/Command + Z
重做:Ctrl/Command + Y
加粗:Ctrl/Command + B
斜体:Ctrl/Command + I
标题:Ctrl/Command + Shift + H
无序列表:Ctrl/Command + Shift + U
有序列表:Ctrl/Command + Shift + O
检查列表:Ctrl/Command + Shift + C
插入代码:Ctrl/Command + Shift + K
插入链接:Ctrl/Command + Shift + L
插入图片:Ctrl/Command + Shift + G
查找:Ctrl/Command + F
替换:Ctrl/Command + G

合理的创建标题,有助于目录的生成

直接输入1次#,并按下space后,将生成1级标题。
输入2次#,并按下space后,将生成2级标题。
以此类推,我们支持6级标题。有助于使用TOC语法后生成一个完美的目录。

如何改变文本的样式

强调文本 强调文本

加粗文本 加粗文本

标记文本

删除文本

引用文本

H2O is是液体。

210 运算结果是 1024.

插入链接与图片

链接: link.

图片: Alt

带尺寸的图片: Alt

居中的图片: Alt

居中并且带尺寸的图片: Alt

当然,我们为了让用户更加便捷,我们增加了图片拖拽功能。

如何插入一段漂亮的代码片

去博客设置页面,选择一款你喜欢的代码片高亮样式,下面展示同样高亮的 代码片.

// An highlighted block
var foo = 'bar';

生成一个适合你的列表

  • 项目
    • 项目
      • 项目
  1. 项目1
  2. 项目2
  3. 项目3
  • 计划任务
  • 完成任务

创建一个表格

一个简单的表格是这么创建的:

项目 Value
电脑 $1600
手机 $12
导管 $1

设定内容居中、居左、居右

使用:---------:居中
使用:----------居左
使用----------:居右

第一列 第二列 第三列
第一列文本居中 第二列文本居右 第三列文本居左

这篇作为第一篇,讲yolov3基本原理.
卷积后的输出
经过basenet(darknet-53)不断的卷积以后得到一个feature map. 我们就用这个feature map来做预测.
比方说原始输入是4164163,一通卷积以后得到一个1313depth的feature map.
这个feature map的每一个cell都有其对应的感受野.(简单滴说:即当前cell的值受到原始图像的哪些pixel的影响).所以现在我们假设每个cell可以预测出一个boundingbox.boudingbox所框出的object的正中心落于当前cell.
You expect each cell of the feature map to predict an object through one of it’s bounding boxes if the center of the object falls in the receptive field of that cell. (Receptive field is the region of the input image visible to the cell. Refer to the link on convolutional neural networks for further clarification).

比如上图的红色cell负责预测狗这个object.
feature map的size为NNDepth,其中Depth=(B x (5 + C))

B指每个cell预测几个boundingbox. 5=4+1. 4代表用于预测boudingbox的四个值,1代表object score,代表这个boundingbox包含目标的概率,C代表要预测的类别个数.
如何计算predicted box的坐标
Anchor Boxes
anchor box是事先聚类出来的一组值.可以理解为最接近现实的object的宽,高.
yolov3中feature map的每一个cell都预测出3个bounding box.但是只选用与ground truth box的IOU最大的做预测.
预测

bx, by, bw, bh are the x,y center co-ordinates, width and height of our prediction. tx, ty, tw, th is what the network outputs. cx and cy are the top-left co-ordinates of the grid. pw and ph are anchors dimensions for the box.
bx by bw bh是预测值 代表预测的bouding box的中心点坐标 宽 高
tx, ty, tw, th 是卷积得到的feature map在depth方向的值
cx,cy是当前cell左上角坐标
pw,ph是事先聚类得到的anchors值
上图中的σ(tx)是sigmoid函数,以确保它的值在0-1之间.这样才能确保预测出来的坐标坐落在当前cell内.比如cell左上角是(6,6),center算出来是(0.4,0.7),那么预测的boudingbox的中心就是(6.4,6.7),如果算出来center是(1.2,0.7),那boundingbox的中心就落到了(7.2,6.7)了,就不再是当前cell了,这与我们的假设是相悖的.(我们假设当前cell是它负责预测的object的中心).

objectness score
这个也是由sigmoid限制到0-1之间,表示包含一个object的概率.
Class Confidences
表示当前object属于某一个class的概率. yolov3不再使用softmax得到.因为softmax默认是排他的.即一个object属于class1,就不可能属于class2. 但实际上一个object可能既属于women又属于person.
多尺度检测
yolov3借鉴了特征金字塔的概念,引入了多尺度检测,使得对小目标检测效果更好.
以416416为例,一系列卷积以后得到1313的feature map.这个feature map有比较丰富的语义信息,但是分辨率不行.所以通过upsample生成2626,5252的feature map,语义信息损失不大,分辨率又提高了,从而对小目标检测效果更好.

对416 x 416, 预测出((52 x 52) + (26 x 26) + 13 x 13)) x 3 = 10647个bounding boxes.通过object score排序,滤掉score过低的,再通过nms逐步确定最终的bounding box.
nms解释看下这个https://blog.csdn.net/zchang81/article/details/70211851.
简单滴说就是每一轮都标记出一个score最高的,把和最高的这个box类似的box去掉,循环反复,最终就得到了最终的box.

配置文件
配置文件yolov3.cfg定义了网络的结构

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear


配置文件描述了model的结构.
yolov3 layer
yolov3有以下几种结构
Convolutional
Shortcut
Upsample
Route
YOLO
Convolutional
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
Shortcut
[shortcut]
from=-3
activation=linear
类似于resnet,用以加深网络深度.上述配置的含义是shortcut layer的输出是前一层和前三层的输出的叠加.
resnet skip connection解释详细见https://zhuanlan.zhihu.com/p/28124810
Upsample
[upsample]
stride=2
通过双线性插值法将NN的feature map变为(strideN) * (stride*N)的feature map.模仿特征金字塔,生成多尺度feature map.加强小目标检测效果.
Route
[route]
layers = -4

[route]
layers = -1, 61
以上述配置为例:
当layers只有一个值,代表route layer输出的是router layer - 4那一层layer的feature map.
当layers有2个值时,代表route layer的输出为route layer -1和第61 layer的feature map在深度方向连接起来.(比如说33100,33200add起来变成33300)
yolo
[yolo]
mask = 0,1,2
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1
yolo层负责预测. anchors是9个anchor,事先聚类得到,表示最有可能的anchor形状.
mask表示哪几组anchor被使用.比如mask=0,1,2代表使用10,13 16,30 30,61这几组anchor. 在原理篇里说过了,每个cell预测3个boudingbox. 三种尺度,总计9种.
Net
[net]

Testing

batch=1
subdivisions=1

Training

batch=64

subdivisions=16

width= 320
height = 320
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
定义了model的输入,batch等等.
现在开始写代码:
解析配置文件
这一步里,做配置文件的解析.把每一块的配置内容存储于一个dict.
def parse_cfg(cfgfile):
“”"
Takes a configuration file

Returns a list of blocks. Each blocks describes a block in the neural
network to be built. Block is represented as a dictionary in the list

"""
file = open(cfgfile, 'r')
# store the lines in a list
lines = file.read().split('\n')
# get read of the empty lines
lines = [x for x in lines if len(x) > 0]
lines = [x for x in lines if x[0] != '#']              # get rid of comments
# get rid of fringe whitespaces
lines = [x.rstrip().lstrip() for x in lines]

block = {}
blocks = []

for line in lines:
	if line[0] == "[":               # This marks the start of a new block
		# If block is not empty, implies it is storing values of previous block.
		if len(block) != 0:
			blocks.append(block)     # add it the blocks list
			block = {}               # re-init the block
		block["type"] = line[1:-1].rstrip()
	else:
		key, value = line.split("=")
		block[key.rstrip()] = value.lstrip()
blocks.append(block)

return blocks

用pytorch创建各个layer
逐个layer创建.
def create_modules(blocks):
# Captures the information about the input and pre-processing
net_info = blocks[0]
module_list = nn.ModuleList()
prev_filters = 3 #卷积的时候需要知道卷积核的depth.卷积核的size在配置文件里定义了.depeth就是上一层的output的depth.
output_filters = [] #用以保存每一个layer的输出的feature map

#index代表了当前layer位于网络的第几层
for index, x in enumerate(blocks[1:]):
    #生成每一个layer
    
    module_list.append(module)
    prev_filters = filters
    output_filters.append(filters)

return(net_info,module_list)

卷积层
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
除了卷积之外实际上还包括了bn和leaky.batchnormalize基本成了标配了现在,用来解决梯度消失的问题(反向传播梯度越乘越小).leaky是激活函数RLU.
所以用到了nn.Sequential()
module = nn.Sequential()
module.add_module(“conv_{0}”.format(index), conv)
module.add_module(“batch_norm_{0}”.format(index), bn)
module.add_module(“leaky_{0}”.format(index), activn)
卷积层创建完整代码
涉及到一个python语法enumerate. 就是为一个list中的每个元素添加一个index,形成新的list.

seasons = [‘Spring’, ‘Summer’, ‘Fall’, ‘Winter’]
list(enumerate(seasons))
[(0, ‘Spring’), (1, ‘Summer’), (2, ‘Fall’), (3, ‘Winter’)]

list(enumerate(seasons, start=1)) # 下标从 1 开始
[(1, ‘Spring’), (2, ‘Summer’), (3, ‘Fall’), (4, ‘Winter’)]
卷积层创建
#index代表了当前layer位于网络的第几层
for index, x in enumerate(blocks[1:]):
module = nn.Sequential()

	#check the type of block
	#create a new module for the block
	#append to module_list

	if (x["type"] == "convolutional"):
        #Get the info about the layer
        activation = x["activation"]
        try:
            batch_normalize = int(x["batch_normalize"])
            bias = False
        except:
            batch_normalize = 0
            bias = True

        filters= int(x["filters"])
        padding = int(x["pad"])
        kernel_size = int(x["size"])
        stride = int(x["stride"])

        if padding:
            pad = (kernel_size - 1) // 2
        else:
            pad = 0

        #Add the convolutional layer
        #prev_filters是上一层输出的feature map的depth.比如上层有64个卷积核,则输出为m*n*64
        conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
        module.add_module("conv_{0}".format(index), conv)

        #Add the Batch Norm Layer
        if batch_normalize:
            bn = nn.BatchNorm2d(filters)
            module.add_module("batch_norm_{0}".format(index), bn)

        #Check the activation. 
        #It is either Linear or a Leaky ReLU for YOLO
        if activation == "leaky":
            activn = nn.LeakyReLU(0.1, inplace = True)
            module.add_module("leaky_{0}".format(index), activn)

upsample层
#If it’s an upsampling layer
#We use Bilinear2dUpsampling
elif (x[“type”] == “upsample”):
stride = int(x[“stride”])
upsample = nn.Upsample(scale_factor = 2, mode = “bilinear”)
module.add_module(“upsample_{}”.format(index), upsample)
route层
[route]
layers = -4

[route]
layers = -1, 61
首先是解析配置文件,然后将相应层的feature map 连接起来作为输出
#If it is a route layer
elif (x[“type”] == “route”):
x[“layers”] = x[“layers”].split(’,’)
#Start of a route
start = int(x[“layers”][0])
#end, if there exists one.
try:
end = int(x"layers")
except:
end = 0
#Positive anotation
if start > 0:
start = start - index #start转换成相对于当前layer的偏移
if end > 0:
end = end - index #end转换成相对于当前layer的偏移
route = EmptyLayer()
module.add_module(“route_{0}”.format(index), route)
if end < 0: #route层concat当前layer前面的某2个layer,所以index>0是无意义的.
filters = output_filters[index + start] + output_filters[index + end]
else:
filters= output_filters[index + start]
这里我们自定义了一个EmptyLayer
class EmptyLayer(nn.Module):
def init(self):
super(EmptyLayer, self).init()
这里定义EmptyLayer是为了代码的简便起见.在pytorch里定义一个自定义的layer.要写一个类,继承自nn.Module,然后实现forward方法.
关于如何定义一个自定义layer,参见下面的link.
https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.html
import torch

class TwoLayerNet(torch.nn.Module):
def init(self, D_in, H, D_out):
“”"
In the constructor we instantiate two nn.Linear modules and assign them as
member variables.
“”"
super(TwoLayerNet, self).init()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)

def forward(self, x):
    """
    In the forward function we accept a Tensor of input data and we must return
    a Tensor of output data. We can use Modules defined in the constructor as
    well as arbitrary operators on Tensors.
    """
    h_relu = self.linear1(x).clamp(min=0)
    y_pred = self.linear2(h_relu)
    return y_pred

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

Create random Tensors to hold inputs and outputs

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

Construct our model by instantiating the class defined above

model = TwoLayerNet(D_in, H, D_out)

Construct our loss function and an Optimizer. The call to model.parameters()

in the SGD constructor will contain the learnable parameters of the two

nn.Linear modules which are members of the model.

criterion = torch.nn.MSELoss(reduction=‘sum’)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)

# Compute and print loss
loss = criterion(y_pred, y)
print(t, loss.item())

# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()

这里由于我们的route layer要做的事情很简单,就是concat两个layer里的feature map,调用torch.cat一行代码的事情,所以没必要定义一个RouteLayer了,直接在代表darknet的nn.Module的forward方法里做concat操作就可以啦.
shorcut层
#shortcut corresponds to skip connection
elif x[“type”] == “shortcut”:
shortcut = EmptyLayer()
module.add_module(“shortcut_{}”.format(index), shortcut)
和route层类似,这边也用个EmptyLayer替代.shortcut所做操作即对两个feature map做addition.
yolo层
yolo层负责根据feature map做预测
首先是解析出有效的anchors.然后用我们自己定义的layer保存这些anchors.然后生成一个module.
涉及到一个python语法super
详细地看:http://www.runoob.com/python/python-func-super.html 简单地说就是为了安全地继承.记住怎么用的就行了.没必要深究
#Yolo is the detection layer
elif x[“type”] == “yolo”:
mask = x[“mask”].split(",")
mask = [int(x) for x in mask]

        anchors = x["anchors"].split(",")
        anchors = [int(a) for a in anchors]
        anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
        anchors = [anchors[i] for i in mask]

        detection = DetectionLayer(anchors)
        module.add_module("Detection_{}".format(index), detection)

#我们自己定义了一个yolo层
class DetectionLayer(nn.Module):
def init(self, anchors):
super(DetectionLayer, self).init()
self.anchors = anchors
测试代码
blocks = parse_cfg(“cfg/yolov3.cfg”)
print(create_modules(blocks))
输出如下

完整代码如下:
#coding=utf-8

from future import division

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np

def parse_cfg(cfgfile):
“”"
Takes a configuration file

Returns a list of blocks. Each blocks describes a block in the neural
network to be built. Block is represented as a dictionary in the list

"""
file = open(cfgfile, 'r')
# store the lines in a list
lines = file.read().split('\n')
# get read of the empty lines
lines = [x for x in lines if len(x) > 0]
lines = [x for x in lines if x[0] != '#']              # get rid of comments
# get rid of fringe whitespaces
lines = [x.rstrip().lstrip() for x in lines]

block = {}
blocks = []

for line in lines:
    if line[0] == "[":               # This marks the start of a new block
        # If block is not empty, implies it is storing values of previous block.
        if len(block) != 0:
            blocks.append(block)     # add it the blocks list
            block = {}               # re-init the block
        block["type"] = line[1:-1].rstrip()
    else:
        key, value = line.split("=")
        block[key.rstrip()] = value.lstrip()
blocks.append(block)

return blocks

class EmptyLayer(nn.Module):
def init(self):
super(EmptyLayer, self).init()

class DetectionLayer(nn.Module):
def init(self, anchors):
super(DetectionLayer, self).init()
self.anchors = anchors

def create_modules(blocks):
# Captures the information about the input and pre-processing
net_info = blocks[0]
module_list = nn.ModuleList()
prev_filters = 3
output_filters = []

#index代表了当前layer位于网络的第几层
for index, x in enumerate(blocks[1:]):
    module = nn.Sequential()

    #check the type of block
    #create a new module for the block
    #append to module_list

    if (x["type"] == "convolutional"):
        #Get the info about the layer
        activation = x["activation"]
        try:
            batch_normalize = int(x["batch_normalize"])
            bias = False
        except:
            batch_normalize = 0
            bias = True

        filters= int(x["filters"])
        padding = int(x["pad"])
        kernel_size = int(x["size"])
        stride = int(x["stride"])

        if padding:
            pad = (kernel_size - 1) // 2
        else:
            pad = 0

        #Add the convolutional layer
        #prev_filters是上一层输出的feature map的depth.比如上层有64个卷积核,则输出为m*n*64
        conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
        module.add_module("conv_{0}".format(index), conv)

        #Add the Batch Norm Layer
        if batch_normalize:
            bn = nn.BatchNorm2d(filters)
            module.add_module("batch_norm_{0}".format(index), bn)

        #Check the activation. 
        #It is either Linear or a Leaky ReLU for YOLO
        if activation == "leaky":
            activn = nn.LeakyReLU(0.1, inplace = True)
            module.add_module("leaky_{0}".format(index), activn)

    #If it's an upsampling layer
    #We use Bilinear2dUpsampling
    elif (x["type"] == "upsample"):
        stride = int(x["stride"])
        upsample = nn.Upsample(scale_factor = 2, mode = "bilinear")
        module.add_module("upsample_{}".format(index), upsample)

        #If it is a route layer
    elif (x["type"] == "route"):
        x["layers"] = x["layers"].split(',')
        #Start  of a route
        start = int(x["layers"][0])
        #end, if there exists one.
        try:
            end = int(x["layers"][1])
        except:
            end = 0
        #Positive anotation
        if start > 0: 
            start = start - index
        if end > 0:
            end = end - index
        route = EmptyLayer()
        module.add_module("route_{0}".format(index), route)
        if end < 0:
            filters = output_filters[index + start] + output_filters[index + end]
        else:
            filters= output_filters[index + start]

    #shortcut corresponds to skip connection
    elif x["type"] == "shortcut":
        shortcut = EmptyLayer()
        module.add_module("shortcut{}".format(index), shortcut)   
    
    #Yolo is the detection layer
    elif x["type"] == "yolo":
        mask = x["mask"].split(",")
        mask = [int(x) for x in mask]
        
        anchors = x["anchors"].split(",")
        anchors = [int(a) for a in anchors]
        anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
        anchors = [anchors[i] for i in mask]

        detection = DetectionLayer(anchors)
        module.add_module("Detection_{}".format(index), detection)  

    module_list.append(module)
    prev_filter = filters
    output_filters.append(filters)
    
return (net_info,module_list)

blocks = parse_cfg("/home/suchang/work_codes/keepgoing/yolov3-torch/cfg/yolov3.cfg")
print(create_modules(blocks))

之前的文章里https://www.cnblogs.com/sdu20112013/p/11099244.html实现了网络的各个layer.
本篇来实现网络的forward的过程.

定义网络
class Darknet(nn.Module):
def init(self, cfgfile):
super(Darknet, self).init()
self.blocks = parse_cfg(cfgfile)
self.net_info, self.module_list = create_modules(self.blocks)
实现网络的forward过程
forward函数继承自nn.Module

Convolutional and Upsample Layers
if module_type == “convolutional” or module_type == “upsample”:
x = self.module_listi
Route Layer / Shortcut Layer
在上一篇里讲过了,route layer的输出是之前某一层或某两层在depth方向的连接.即

output[current_layer] = output[previous_layer]
或者
map1 = outputs[i + layers[0]]
map2 = outputs[i + layers1]
output[current layer]=torch.cat((map1, map2), 1)
所以route layer代码如下:

    elif module_type == "route":
        layers = module["layers"]
        layers = [int(a) for a in layers]

        if (layers[0]) > 0:
            layers[0] = layers[0] - i

        if len(layers) == 1:
            x = outputs[i + (layers[0])]

        else:
            if (layers[1]) > 0:
                layers[1] = layers[1] - i

            map1 = outputs[i + layers[0]]
            map2 = outputs[i + layers[1]]

            x = torch.cat((map1, map2), 1)

shortcut layer的输出为前一层及前xx层(配置文件中配置)的输出之和

    elif  module_type == "shortcut":
        from_ = int(module["from"])
        x = outputs[i-1] + outputs[i+from_]

YOLO layer
yolo层的输出是一个nndepth的feature map矩阵.假设你想访问第(5,6)个cell的第2个boundingbox的话你需要map[5,6,(5+C):2*(5+C)]这样访问,这种形式操作起来有点麻烦,所以我们引入一个predict_transform函数来改变一下输出的形式.

简而言之我们希望把一个batch_sizegrid_sizegrid_size*(B*(5+C))的4-D矩阵转换为batch_size*(grid_sizegrid_sizeB)*(5+C)的矩阵.
2-D矩阵的每一行的排列如下:

batch_size = prediction.size(0)
stride =  inp_dim // prediction.size(2)
grid_size = inp_dim // stride
bbox_attrs = 5 + num_classes
num_anchors = len(anchors)

prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
prediction = prediction.transpose(1,2).contiguous()
prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)

上述代码涉及到pytorch中view的用法,和numpy中resize类似.contiguous一般与transpose,permute,view搭配使用,维度变换后tensor在内存中不再是连续存储的,而view操作要求连续存储,所以需要contiguous.最终我们得到一个batch_size*(grid_sizegrid_sizenum_anchors)*bbox_attrs的矩阵.

接下来要对预测boundingbox的坐标.

注意此时prediction[:,:,0],prediction[:,:,1],prediction[:,:,2],prediction[:,:,3]prediction[:,:,4]即相应的tx,ty,tw,th,obj score.
接下来是预测相对当前cell左上角的offset

#sigmoid转换为0-1范围内
#Sigmoid the  centre_X, centre_Y. and object confidencce
prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])

#Add the center offsets
grid = np.arange(grid_size)
a,b = np.meshgrid(grid, grid)

x_offset = torch.FloatTensor(a).view(-1,1)
y_offset = torch.FloatTensor(b).view(-1,1)

if CUDA:
    x_offset = x_offset.cuda()
    y_offset = y_offset.cuda()

x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)

#prediction[:,:,:0],prediction[:,:,:1]修改为相对于当前cell偏移
prediction[:,:,:2] += x_y_offset

有关meshgrid用法效果如下:

import numpy as np
import torch
grid_size = 13
grid = np.arange(grid_size)
a,b = np.meshgrid(grid, grid)
print(a)
print(b)

x_offset = torch.FloatTensor(a).view(-1,1)
#print(x_offset)
y_offset = torch.FloatTensor(b).view(-1,1)
这段代码输出如下:

预测boundingbox的width,height.注意anchors的大小要转换为适配当前feature map的大小.配置文件中配置的是相对于模型输入的大小.

anchors = [(a[0]/stride, a[1]/stride) for a in anchors]  #适配到feature map上的尺寸

#log space transform height and the width
anchors = torch.FloatTensor(anchors)

if CUDA:
    anchors = anchors.cuda()

anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors

##还原为原始图片上对应的坐标
prediction[:,:,:4] *= stride

预测class probability

prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))
predict_transform完整代码如下

#yolo经过不断地卷积得到的feature map size= batch_size*(B*(5+C))grid_sizegrid_size
def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):
if CUDA:
prediction = prediction.to(torch.device(“cuda”)) #使用gpu torch0.4不需要 torch1.0需要

batch_size = prediction.size(0)
stride =  inp_dim // prediction.size(2)
grid_size = inp_dim // stride
bbox_attrs = 5 + num_classes
num_anchors = len(anchors)

print("prediction.shape=",prediction.shape)
print("batch_size=",batch_size)
print("inp_dim=",inp_dim)
#print("anchors=",anchors)
#print("num_classes=",num_classes)

print("grid_size=",grid_size)
print("bbox_attrs=",bbox_attrs)


prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
prediction = prediction.transpose(1,2).contiguous()
prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)

#Sigmoid the  centre_X, centre_Y. and object confidencce
prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])

#Add the center offsets
grid = np.arange(grid_size).astype(np.float32)
a,b = np.meshgrid(grid, grid)

x_offset = torch.FloatTensor(a).view(-1,1)
y_offset = torch.FloatTensor(b).view(-1,1)

if CUDA:
    x_offset = x_offset.cuda()
    y_offset = y_offset.cuda()

x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)

print(type(x_y_offset),type(prediction[:,:,:2]))
prediction[:,:,:2] += x_y_offset

anchors = [(a[0]/stride, a[1]/stride) for a in anchors]  #适配到和feature map大小匹配
#log space transform height and the width
anchors = torch.FloatTensor(anchors)

if CUDA:
    anchors = anchors.cuda()

anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors

prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))

prediction[:,:,:4] *= stride #恢复到原始图片上的相应坐标,width,height等

return prediction

助手函数写好了,现在来继续实现Darknet类的forward方法

        elif module_type == "yolo":
            anchors = self.module_list[i][0].anchors
            inp_dim = int(self.net_info["height"])
            num_classes = int (module["classes"])
            x = x.data
            x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)
            if not write:              #if no collector has been intialised. 
                detections = x
                write = 1
            else:       
                detections = torch.cat((detections, x), 1)

在没有写predict_transform之前,不同的feature map矩阵,比如1313N1,2626N2,5252N3是没法直接连接成一个tensor的,现在都变成了xx*(5+C)则可以了.
上面代码里的write flag主要是为了区别detections是否为空,为空则说明是第一个yolo layer做的预测,将yolo层的输出赋值给predictions,不为空则连接当前yolo layer的输出至detections.

测试
下载测试图片wget https://github.com/ayooshkathuria/pytorch-yolo-v3/raw/master/dog-cycle-car.png

def get_test_input():
img = cv2.imread(“dog-cycle-car.png”)
img = cv2.resize(img, (608,608)) #Resize to the input dimension
img_ = img[:,:,::-1].transpose((2,0,1)) # BGR -> RGB | H X W C -> C X H X W
img_ = img_[np.newaxis,:,:,:]/255.0 #Add a channel at 0 (for batch) | Normalise
img_ = torch.from_numpy(img_).float() #Convert to float
img_ = Variable(img_) # Convert to Variable
return img_

model = Darknet(“cfg/yolov3.cfg”)
inp = get_test_input()
pred = model(inp, torch.cuda.is_available())
print (pred)
cv2.imread()导入图片时是BGR通道顺序,并且是hwc,比如4164163这种格式,我们要转换为3416416这种格式.如果有

RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
在predict_transform开头添加prediction = prediction.to(torch.device(“cuda”)) #使用gpu
RuntimeError: shape ‘[1, 255, 3025]’ is invalid for input of size 689520
注意检查你的input的img的大小和你模型的输入大小是否匹配. 比如模型是608*608的
最终测试结果如下:

预测出22743个boundingbox,一共3种feature map,分别为1919,3838,7676 每种尺度下预测出3个box,一共3(1919 + 3838 + 76*76) = 22743个box.

在上一篇里我们实现了forward函数.得到了prediction.此时预测出了特别多的box以及各种class probability,现在我们要从中过滤出我们最终的预测box.
理解了yolov3的输出的格式及每一个位置的含义,并不难理解源码.我在阅读源码的过程中主要的困难在于对pytorch不熟悉,所以在这篇文章里,关于其中涉及的一些pytorch中的函数的用法我都已经用加粗标示了并且给出了相应的链接,测试代码等.

obj score threshold
我们设置一个obj score thershold,超过这个值的才认为是有效的.

conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
prediction = prediction*conf_mask

prediction是1boxnumboxattr
prediction[:,:,4]是1*boxnum 元素值为boxattr的index=4的那个值.

torch中的Tensor index和numpy是类似的,参看下列代码输出

import torch
x = torch.Tensor(1,3,10) # Create an un-initialized Tensor of size 2x3
print(x)
print(x.shape) # Print out the Tensor

y = x[:,:,4]
print(y)
print(y.shape)

z = x[:,:,4:6]
print(z)
print(z.shape)

print((y>0.5).float().unsqueeze(2))

输出如下

tensor([[[2.5226e-18, 1.6898e-04, 1.0413e-11, 7.7198e-10, 1.0549e-08,
4.0516e-11, 1.0681e-05, 2.9575e-18, 6.7333e+22, 1.7591e+22],
[1.7184e+25, 4.3222e+27, 6.1972e-04, 7.2443e+22, 1.7728e+28,
7.0367e+22, 5.9018e-10, 2.6540e-09, 1.2972e-11, 5.3370e-08],
[2.7001e-06, 2.6801e-09, 4.1292e-05, 2.1511e+23, 3.2770e-09,
2.5125e-18, 7.7052e+31, 1.9447e+31, 5.0207e+28, 1.1492e-38]]])
torch.Size([1, 3, 10])
tensor([[1.0549e-08, 1.7728e+28, 3.2770e-09]])
torch.Size([1, 3])
tensor([[[1.0549e-08, 4.0516e-11],
[1.7728e+28, 7.0367e+22],
[3.2770e-09, 2.5125e-18]]])
torch.Size([1, 3, 2])

tensor([[[0.],
[0.],
[0.]]])
Squeeze and unsqueeze 降低维度,升高维度.

t = torch.ones(2,1,2,1) # Size 2x1x2x1
r = torch.squeeze(t) # Size 2x2
r = torch.squeeze(t, 1) # Squeeze dimension 1: Size 2x2x1

Un-squeeze a dimension

x = torch.Tensor([1, 2, 3])
r = torch.unsqueeze(x, 0) # Size: 1x3 表示在第0个维度添加1维
r = torch.unsqueeze(x, 1) # Size: 3x1 表示在第1个维度添加1维

这样prediction中objscore

nms
tensor.new() 创建一个和原有tensor的dtype一致的新tensor https://stackoverflow.com/questions/49263588/pytorch-beginner-tensor-new-method

#得到box坐标(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)
box_corner = prediction.new(prediction.shape)
box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) 
box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
prediction[:,:,:4] = box_corner[:,:,:4]

原始的prediction中boxattr存放的是x,y,w,h,…,不方便我们处理,我们将其转换成(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)

接下来我们挨个处理每一张图片对应的feature map.

batch_size = prediction.size(0)
write = False

for ind in range(batch_size):
    #image_pred.shape=boxnum\*boxattr
    image_pred = prediction[ind]          #image Tensor  box_num*box_attr
    #confidence threshholding 
    #NMS
    #返回每一行的最大值,及最大值所在的列.
    max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1)
    #升级成和image_pred同样的维度
    max_conf = max_conf.float().unsqueeze(1)
    max_conf_score = max_conf_score.float().unsqueeze(1)
    seq = (image_pred[:,:5], max_conf, max_conf_score)
    
    #沿着列的方向拼接. 现在image_pred变成boxnum\*7
    image_pred = torch.cat(seq, 1)

这里涉及到torch.max的用法,参见https://blog.csdn.net/Z_lbj/article/details/79766690
torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)
按维度dim 返回最大值.可以这么记忆,沿着第dim维度比较.torch.max(0)即沿着行的方向比较,即得到每列的最大值.
假设input是二维矩阵,即行*列,行是第0维,列是第一维.

torch.max(a,0) 返回每一列中最大值的那个元素,且返回索引(返回最大元素在这一列的行索引)
torch.max(a,1) 返回每一行中最大值的那个元素,且返回其索引(返回最大元素在这一行的列索引)
c=torch.Tensor([[1,2,3],[6,5,4]])
print©
a,b=torch.max(c,1)
print(a)
print(b)

##输出如下:
tensor([[1., 2., 3.],
[6., 5., 4.]])
tensor([3., 6.])
tensor([2, 0])
torch.cat用法,参见https://pytorch.org/docs/stable/torch.html

torch.cat(tensors, dim=0, out=None) → Tensor

x = torch.randn(2, 3)
x
tensor([[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497]])

torch.cat((x, x, x), 0)
tensor([[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497],
[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497],
[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497]])

torch.cat((x, x, x), 1)
tensor([[ 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614, 0.6580,
-1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497, -0.1034,
-0.5790, 0.1497]])
接下来我们只处理obj_score非0的数据(obj_score

    non_zero_ind =  (torch.nonzero(image_pred[:,4]))
    try:
        image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
    except:
        continue

    #For PyTorch 0.4 compatibility
    #Since the above code with not raise exception for no detection 
    #as scalars are supported in PyTorch 0.4
    if image_pred_.shape[0] == 0:
        continue 

ok,接下来我们对每一种class做nms.
首先取到我们有哪些类别

    #Get the various classes detected in the image
    img_classes = unique(image_pred_[:,-1])  # -1 index holds the class index

然后依次对每一种类别做处理

for cls in img_classes:
#perform NMS

        #get the detections with one particular class
        #取出当前class为当前class且class prob!=0的行
        cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
        class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
        image_pred_class = image_pred_[class_mask_ind].view(-1,7)
        
        #sort the detections such that the entry with the maximum objectness
        #confidence is at the top
        #按照obj score从高到低做排序
        conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]
        image_pred_class = image_pred_class[conf_sort_index]
        idx = image_pred_class.size(0)   #Number of detections
        
        for i in range(idx):
            #Get the IOUs of all boxes that come after the one we are looking at 
            #in the loop
            try:
                #计算第i个和其后每一行的的iou
                ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
            except ValueError:
                break
        
            except IndexError:
                break
        
            #Zero out all the detections that have IoU > treshhold
            #把与第i行iou>nms_conf的认为是同一个目标的box,将其转成0
            iou_mask = (ious < nms_conf).float().unsqueeze(1)
            image_pred_class[i+1:] *= iou_mask       
        
            #把iou>nms_conf的移除掉
            non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
            image_pred_class = image_pred_class[non_zero_ind].view(-1,7)
            
        batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)      #Repeat the batch_id for as many detections of the class cls in the image
        seq = batch_ind, image_pred_class

其中计算iou的代码如下,不多解释了.iou=交叠面积/总面积

def bbox_iou(box1, box2):
“”"
Returns the IoU of two bounding boxes

"""
#Get the coordinates of bounding boxes
b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3]

#get the corrdinates of the intersection rectangle
inter_rect_x1 =  torch.max(b1_x1, b2_x1)
inter_rect_y1 =  torch.max(b1_y1, b2_y1)
inter_rect_x2 =  torch.min(b1_x2, b2_x2)
inter_rect_y2 =  torch.min(b1_y2, b2_y2)

#Intersection area
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0)

#Union Area
b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)

iou = inter_area / (b1_area + b2_area - inter_area)

return iou

关于nms可以看下https://blog.csdn.net/shuzfan/article/details/52711706

tensor index操作用法如下:

image_pred_ = torch.Tensor([[1,2,3,4,9],[5,6,7,8,9]])
#print(image_pred_[:,-1] == 9)
has_9 = (image_pred_[:,-1] == 9)
print(has_9)

###执行顺序是(image_pred_[:,-1] == 9).float().unsqueeze(1) 再做tensor乘法
cls_mask = image_pred_*(image_pred_[:,-1] == 9).float().unsqueeze(1)
print(cls_mask)
class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
image_pred_class = image_pred_[class_mask_ind]

输出如下:
tensor([1, 1], dtype=torch.uint8)
tensor([[1., 2., 3., 4., 9.],
[5., 6., 7., 8., 9.]])
torch.sort用法如下:

d=torch.Tensor([[1,2,3],[6,5,4]])
e=d[:,2]
print(e)
print(torch.sort(e))

输出
tensor([3., 4.])

torch.return_types.sort(
values=tensor([3., 4.]),
indices=tensor([0, 1]))
总结一下我们做nms的流程
每一个image,会预测出N个detetction信息,包括4+1+C(4个坐标信息,1个obj score以及C个class probability)

首先过滤掉obj_score < confidence的行
每一行只取class probability最高的作为预测出来的类别
将所有的预测按照obj_score从大到小排序
循环每一种类别,开始做nms
比较第一个box与其后所有box的iou,删除iou>threshold的box,即剔除所有相似box
比较下一个box与其后所有box的iou,删除所有与该box相似的box
不断重复上述过程,直至不再有相似box
至此,实现了当前处理的类别的多个box均是独一无二的box.
write_results最终的返回值是一个n*8的tensor,其中8是(batch_index,4个坐标,1个objscore,1个class prob,一个class index)

def write_results(prediction, confidence, num_classes, nms_conf = 0.4):
print(“prediction.shape=”,prediction.shape)

#将obj_score < confidence的行置为0
conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
prediction = prediction*conf_mask

#得到box坐标(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)
box_corner = prediction.new(prediction.shape)
box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) 
box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
#修改prediction第三个维度的前四列
prediction[:,:,:4] = box_corner[:,:,:4]

batch_size = prediction.size(0)
write = False

for ind in range(batch_size):
    #image_pred.shape=boxnum\*boxattr
    image_pred = prediction[ind]          #image Tensor
    #confidence threshholding 
    #NMS

    ##取出每一行的class score最大的一个
    max_conf_score,max_conf = torch.max(image_pred[:,5:5+ num_classes], 1)
    max_conf = max_conf.float().unsqueeze(1)
    max_conf_score = max_conf_score.float().unsqueeze(1)
    seq = (image_pred[:,:5], max_conf_score, max_conf)
    image_pred = torch.cat(seq, 1) #现在变成7列,分别为左上角x,左上角y,右下角x,右下角y,obj score,最大probabilty,相应的class index
    print(image_pred.shape)

    non_zero_ind =  (torch.nonzero(image_pred[:,4]))
    try:
        image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
    except:
        continue

    #For PyTorch 0.4 compatibility
    #Since the above code with not raise exception for no detection 
    #as scalars are supported in PyTorch 0.4
    if image_pred_.shape[0] == 0:
        continue 

    #Get the various classes detected in the image
    img_classes = unique(image_pred_[:,-1])  # -1 index holds the class index
    
    
    for cls in img_classes:
        #perform NMS

        #get the detections with one particular class
        #取出当前class为当前class且class prob!=0的行
        cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
        class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
        image_pred_class = image_pred_[class_mask_ind].view(-1,7)
        
        #sort the detections such that the entry with the maximum objectness
        #confidence is at the top
        #按照obj score从高到低做排序
        conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]
        image_pred_class = image_pred_class[conf_sort_index]
        idx = image_pred_class.size(0)   #Number of detections
        
        for i in range(idx):
            #Get the IOUs of all boxes that come after the one we are looking at 
            #in the loop
            try:
                #计算第i个和其后每一行的的iou
                ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
            except ValueError:
                break
        
            except IndexError:
                break
        
            #Zero out all the detections that have IoU > treshhold
            #把与第i行iou>nms_conf的认为是同一个目标的box,将其转成0
            iou_mask = (ious < nms_conf).float().unsqueeze(1)
            image_pred_class[i+1:] *= iou_mask       
        
            #把iou>nms_conf的移除掉
            non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
            image_pred_class = image_pred_class[non_zero_ind].view(-1,7)
            
        batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)      #Repeat the batch_id for as many detections of the class cls in the image
        seq = batch_ind, image_pred_class
        
        if not write:
            output = torch.cat(seq,1)  #沿着列方向,shape 1*8
            write = True
        else:
            out = torch.cat(seq,1)
            output = torch.cat((output,out)) #沿着行方向 shape n*8

try:
    return output
except:
    return 0

前面4篇已经实现了network的forward,并且将network的output已经转换成了易于操作的detection prediction格式.
本篇把前面四篇实现的功能组织起来,实现端到端的推理过程.

整体流程如下

读取图片,对图片前处理,把图片调整到模型的input size及输入顺序(rgb c x h x w).
加载模型,读取模型权重文件.
将第一步读到的矩阵送给模型.进行forward运算.得到prediction
后处理,我们得到的box坐标是相对于调整后的图片的.要处理成原图上的坐标.
detector.py 实现完整的端到端的图片检测. 用法python detect.py --images dog-cycle-car.png --det det

from future import division
import time
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
import cv2
from util import *
import argparse
import os
import os.path as osp
from darknet import Darknet
import pickle as pkl
import pandas as pd
import random

def arg_parse():
“”"
Parse arguements to the detect module

"""

parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')

parser.add_argument("--images", dest = 'images', help = 
                    "Image / Directory containing images to perform detection upon",
                    default = "imgs", type = str)
parser.add_argument("--det", dest = 'det', help = 
                    "Image / Directory to store detections to",
                    default = "det", type = str)
parser.add_argument("--bs", dest = "bs", help = "Batch size", default = 1)
parser.add_argument("--confidence", dest = "confidence", help = "Object Confidence to filter predictions", default = 0.5)
parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4)
parser.add_argument("--cfg", dest = 'cfgfile', help = 
                    "Config file",
                    default = "cfg/yolov3.cfg", type = str)
parser.add_argument("--weights", dest = 'weightsfile', help = 
                    "weightsfile",
                    default = "yolov3.weights", type = str)
parser.add_argument("--reso", dest = 'reso', help = 
                    "Input resolution of the network. Increase to increase accuracy. Decrease to increase speed",
                    default = "416", type = str)

return parser.parse_args()

args = arg_parse()
images = args.images
batch_size = int(args.bs)
confidence = float(args.confidence)
nms_thesh = float(args.nms_thresh)
start = 0
CUDA = torch.cuda.is_available()

num_classes = 80
classes = load_classes(“data/coco.names”)

#Set up the neural network
print(“Loading network…”)
model = Darknet(args.cfgfile)
model.load_weights(args.weightsfile)
print(“Network successfully loaded”)

model.net_info[“height”] = args.reso
inp_dim = int(model.net_info[“height”])
assert inp_dim % 32 == 0
assert inp_dim > 32

#If there’s a GPU availible, put the model on GPU
if CUDA:
model.cuda()

#Set the model in evaluation mode
model.eval()

read_dir = time.time()
#Detection phase
try:
imlist = [osp.join(osp.realpath(’.’), images, img) for img in os.listdir(images)]
except NotADirectoryError:
imlist = []
imlist.append(osp.join(osp.realpath(’.’), images))
except FileNotFoundError:
print (“No file or directory with the name {}”.format(images))
exit()

if not os.path.exists(args.det):
os.makedirs(args.det)

load_batch = time.time()
loaded_ims = [cv2.imread(x) for x in imlist]

im_batches = list(map(prep_image, loaded_ims, [inp_dim for x in range(len(imlist))]))
im_dim_list = [(x.shape1, x.shape[0]) for x in loaded_ims]
im_dim_list = torch.FloatTensor(im_dim_list).repeat(1,2)

leftover = 0
if (len(im_dim_list) % batch_size):
leftover = 1

if batch_size != 1:
num_batches = len(imlist) // batch_size + leftover
im_batches = [torch.cat((im_batches[i*batch_size : min((i + 1)*batch_size,
len(im_batches))])) for i in range(num_batches)]

write = 0

if CUDA:
im_dim_list = im_dim_list.cuda()

start_det_loop = time.time()
for i, batch in enumerate(im_batches):
#load the image
start = time.time()
if CUDA:
batch = batch.cuda()
with torch.no_grad():
prediction = model(Variable(batch), CUDA) #类调用,相当于调用类的__call__()函数

prediction = write_results(prediction, confidence, num_classes, nms_conf = nms_thesh)

end = time.time()

if type(prediction) == int:

    for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
        im_id = i*batch_size + im_num
        print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
        print("{0:20s} {1:s}".format("Objects Detected:", ""))
        print("----------------------------------------------------------")
    continue

prediction[:,0] += i*batch_size    #transform the atribute from index in batch to index in imlist 

if not write:                      #If we have't initialised output
    output = prediction  
    write = 1
else:
    output = torch.cat((output,prediction))

for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
    im_id = i*batch_size + im_num
    objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
    print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
    print("{0:20s} {1:s}".format("Objects Detected:", " ".join(objs)))
    print("----------------------------------------------------------")

if CUDA:
    torch.cuda.synchronize()       

try:
output
except NameError:
print (“No detections were made”)
exit()

im_dim_list = torch.index_select(im_dim_list, 0, output[:,0].long())

scaling_factor = torch.min(416/im_dim_list,1)[0].view(-1,1)

output[:,[1,3]] -= (inp_dim - scaling_factorim_dim_list[:,0].view(-1,1))/2
output[:,[2,4]] -= (inp_dim - scaling_factor
im_dim_list[:,1].view(-1,1))/2

output[:,1:5] /= scaling_factor

for i in range(output.shape[0]):
output[i, [1,3]] = torch.clamp(output[i, [1,3]], 0.0, im_dim_list[i,0])
output[i, [2,4]] = torch.clamp(output[i, [2,4]], 0.0, im_dim_list[i,1])

output_recast = time.time()
class_load = time.time()
colors = pkl.load(open(“pallete”, “rb”))

draw = time.time()

def write(x, results):
c1 = tuple(x[1:3].int())
c2 = tuple(x[3:5].int())
img = results[int(x[0])]
cls = int(x[-1])
color = random.choice(colors)
label = “{0}”.format(classes[cls])
cv2.rectangle(img, c1, c2,color, 1)
t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
c2 = c1[0] + t_size[0] + 3, c11 + t_size1 + 4
cv2.rectangle(img, c1, c2,color, -1)
cv2.putText(img, label, (c1[0], c11 + t_size1 + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1);
return img

list(map(lambda x: write(x, loaded_ims), output))

det_names = pd.Series(imlist).apply(lambda x: “{}/det_{}”.format(args.det,x.split("/")[-1]))

list(map(cv2.imwrite, det_names, loaded_ims))

end = time.time()

print(“SUMMARY”)
print("----------------------------------------------------------")
print("{:25s}: {}".format(“Task”, “Time Taken (in seconds)”))
print()
print("{:25s}: {:2.3f}".format(“Reading addresses”, load_batch - read_dir))
print("{:25s}: {:2.3f}".format(“Loading batch”, start_det_loop - load_batch))
print("{:25s}: {:2.3f}".format(“Detection (” + str(len(imlist)) + " images)", output_recast - start_det_loop))
print("{:25s}: {:2.3f}".format(“Output Processing”, class_load - output_recast))
print("{:25s}: {:2.3f}".format(“Drawing Boxes”, end - draw))
print("{:25s}: {:2.3f}".format(“Average time_per_img”, (end - load_batch)/len(imlist)))
print("----------------------------------------------------------")

torch.cuda.empty_cache()

第一段没啥好说的,我们希望可以通过命令行传参,所以用ArgParse模块来实现参数解析.
第二段 模型加载
#Set up the neural network
print(“Loading network…”)
model = Darknet(args.cfgfile)
model.load_weights(args.weightsfile)
print(“Network successfully loaded”)
第三段 图像预处理
对任意一个图片,要先做预处理,把尺寸处理到model的input size.

read_dir = time.time()
#Detection phase
try:
imlist = [osp.join(osp.realpath(’.’), images, img) for img in os.listdir(images)]
except NotADirectoryError:
imlist = []
imlist.append(osp.join(osp.realpath(’.’), images))
except FileNotFoundError:
print (“No file or directory with the name {}”.format(images))
exit()

if not os.path.exists(args.det):
os.makedirs(args.det)

load_batch = time.time()
loaded_ims = [cv2.imread(x) for x in imlist]

im_batches = list(map(prep_image, loaded_ims, [inp_dim for x in range(len(imlist))]))
im_dim_list = [(x.shape1, x.shape[0]) for x in loaded_ims]
im_dim_list = torch.FloatTensor(im_dim_list).repeat(1,2)

leftover = 0
if (len(im_dim_list) % batch_size):
leftover = 1

if batch_size != 1:
num_batches = len(imlist) // batch_size + leftover
im_batches = [torch.cat((im_batches[i*batch_size : min((i + 1)*batch_size,
len(im_batches))])) for i in range(num_batches)]

从某个目录读入n多个图片.假设模型每个batch处理5个图片.图片为320 x 320 x 3. 则每次输入模型的矩阵为(320*5) x 320 x 3.即

im_batches = [torch.cat((im_batches[i*batch_size : min((i + 1)*batch_size,
len(im_batches))])) for i in range(num_batches)]
所做的事情.

图片的前处理所用到的一些工具函数如下.

def letterbox_image(img, inp_dim):
‘’‘resize image with unchanged aspect ratio using padding’’’
img_w, img_h = img.shape1, img.shape[0]
w, h = inp_dim
new_w = int(img_w * min(w/img_w, h/img_h))
new_h = int(img_h * min(w/img_w, h/img_h))
resized_image = cv2.resize(img, (new_w,new_h), interpolation = cv2.INTER_CUBIC)

canvas = np.full((inp_dim[1], inp_dim[0], 3), 128)

canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w,  :] = resized_image

return canvas

保证原有图片的宽高比,其余位置灰度值填充.

cv读进来的bgr格式,我们转成rgb的.然后transpose 把h x w x c的转成c x h x w的.

def prep_image(img, inp_dim):
“”"
Prepare image for inputting to the neural network.

Returns a Variable 
"""

img = cv2.resize(img, (inp_dim, inp_dim
img = img[:,:,::-1].transpose((2,0,1)).copy()
img = torch.from_numpy(img).float().div(255.0).unsqueeze(0)
return img

参考https://www.cnblogs.com/sdu20112013/p/11216322.html

4.将矩阵喂给模型,进行forward
for i, batch in enumerate(im_batches):
#load the image
start = time.time()
if CUDA:
batch = batch.cuda()
with torch.no_grad():
prediction = model(Variable(batch), CUDA) #类调用,相当于调用类的__call__()函数

prediction = write_results(prediction, confidence, num_classes, nms_conf = nms_thesh)

end = time.time()

if type(prediction) == int:

    for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
        im_id = i*batch_size + im_num
        print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
        print("{0:20s} {1:s}".format("Objects Detected:", ""))
        print("----------------------------------------------------------")
    continue

prediction[:,0] += i*batch_size    #transform the atribute from index in batch to index in imlist 

if not write:                      #If we have't initialised output
    output = prediction  
    write = 1
else:
    output = torch.cat((output,prediction))

for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
    im_id = i*batch_size + im_num
    objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
    print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
    print("{0:20s} {1:s}".format("Objects Detected:", " ".join(objs)))
    print("----------------------------------------------------------")

其中重点就是

prediction = model(Variable(batch), CUDA) #类调用,相当于调用类的__call__()函数,

prediction = write_results(prediction, confidence, num_classes, nms_conf = nms_thesh)
涉及到一个python语法,类实例调用.其实就相当于调用__call__().基类nn.module的__call__()里调用了forward().所以这一句实际上就相当于调用model.forward(batch).

5.后处理
im_dim_list = torch.index_select(im_dim_list, 0, output[:,0].long())

scaling_factor = torch.min(416/im_dim_list,1)[0].view(-1,1)

output[:,[1,3]] -= (inp_dim - scaling_factorim_dim_list[:,0].view(-1,1))/2
output[:,[2,4]] -= (inp_dim - scaling_factor
im_dim_list[:,1].view(-1,1))/2

output[:,1:5] /= scaling_factor

for i in range(output.shape[0]):
output[i, [1,3]] = torch.clamp(output[i, [1,3]], 0.0, im_dim_list[i,0])
output[i, [2,4]] = torch.clamp(output[i, [2,4]], 0.0, im_dim_list[i,1])

output中的box坐标是相对于模型的输入图片的,将其映射到相对于原始图片的位置.

图片绘制,涉及python基础语法参考https://www.cnblogs.com/sdu20112013/p/11216584.html

list(map(lambda x: write(x, loaded_ims), output))

det_names = pd.Series(imlist).apply(lambda x: “{}/det_{}”.format(args.det,x.split("/")[-1]))

list(map(cv2.imwrite, det_names, loaded_ims))

SmartyPants

SmartyPants将ASCII标点字符转换为“智能”印刷标点HTML实体。例如:

TYPE ASCII HTML
Single backticks 'Isn't this fun?' ‘Isn’t this fun?’
Quotes "Isn't this fun?" “Isn’t this fun?”
Dashes -- is en-dash, --- is em-dash – is en-dash, — is em-dash

创建一个自定义列表

Markdown
Text-to- HTML conversion tool
Authors
John
Luke

如何创建一个注脚

一个具有注脚的文本。2

注释也是必不可少的

Markdown将文本转换为 HTML

KaTeX数学公式

您可以使用渲染LaTeX数学表达式 KaTeX:

Gamma公式展示 Γ ( n ) = ( n − 1 ) ! ∀ n ∈ N \Gamma(n) = (n-1)!\quad\forall n\in\mathbb N Γ(n)=(n1)!nN 是通过欧拉积分

Γ ( z ) = ∫ 0 ∞ t z − 1 e − t d t   . \Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\,. Γ(z)=0tz1etdt.

你可以找到更多关于的信息 LaTeX 数学表达式here.

新的甘特图功能,丰富你的文章

Mon 06 Mon 13 Mon 20 已完成 进行中 计划一 计划二 现有任务 Adding GANTT diagram functionality to mermaid
  • 关于 甘特图 语法,参考 这儿,

UML 图表

可以使用UML图表进行渲染。 Mermaid. 例如下面产生的一个序列图:

张三 李四 王五 你好!李四, 最近怎么样? 你最近怎么样,王五? 我很好,谢谢! 我很好,谢谢! 李四想了很长时间, 文字太长了 不适合放在一行. 打量着王五... 很好... 王五, 你怎么样? 张三 李四 王五

这将产生一个流程图。:

链接
长方形
圆角长方形
菱形
  • 关于 Mermaid 语法,参考 这儿,

FLowchart流程图

我们依旧会支持flowchart的流程图:

Created with Raphaël 2.2.0 开始 我的操作 确认? 结束 yes no
  • 关于 Flowchart流程图 语法,参考 这儿.

导出与导入

导出

如果你想尝试使用此编辑器, 你可以在此篇文章任意编辑。当你完成了一篇文章的写作, 在上方工具栏找到 文章导出 ,生成一个.md文件或者.html文件进行本地保存。

导入

如果你想加载一篇你写过的.md文件,在上方工具栏可以选择导入功能进行对应扩展名的文件导入,
继续你的创作。


  1. mermaid语法说明 ↩︎

  2. 注脚的解释 ↩︎

你可能感兴趣的:(tensorflow2.1,CUDA10.1 的 WIN10下安装)