- 本文参考了深度学习译文集之PyTorch篇,在线地址、github地址。
- 本文内容来自PyTorch官网教程《Image and Video》篇
- 本文翻译自PyTorch官方文档《TorchVision Object Detection Finetuning Tutorial》
- 为了充分利用本教程,我们建议使用此 Colab 版本,可以直接运行本章代码。
- 本章完整代码下载
在本教程中,我们将对 Penn-Fudan 数据库中的行人检测和分割,使用预训练的 Mask R-CNN 模型进行微调。 它包含 170 个图像和 345 个行人实例,我们将用它来说明如何在torchvision
中使用新功能,以便在自定义数据集上训练实例分割模型。
自定义数据集应继承自标准的torch.utils.data.Dataset
类,并实现__len__
和__getitem__
。其中,主要重写的方法是__getitem__
(读取数据集),它应该返回:
(H, W)
的 PIL 图像boxes
:(FloatTensor[N, 4]),[x0, y0, x1, y1]
格式的N
个边界框的坐标labels
:(Int64Tensor[N]),每个边界框的标签。 0
代表背景类。image_id
:(Int64Tensor[1]),图篇索引。area
:(Tensor[N]),boxes的面积。 在使用 COCO 方法进行评估时,可使用此值来区分小目标、中等目标 、大目标的得分。iscrowd
:(UInt8Tensor[N]),iscrowd = True
表示难检测的目标,在评估时会被舍去。masks
:可选,格式为(UInt8Tensor[N, H, W]),表示每个物体的segmentation masks(分割掩模)keypoints
:可选,(FloatTensor[N, K, 3])格式。表示一张图有N个物体, K 个关键点(格式为[x, y, visibility]
)。 visibility=0 表示关键点不可见。 对于数据增强,翻转关键点取决于数据表示形式,您可能应该将references/detection/transforms.py
修改为新的关键点表示(keypoint representation)。 定义完之后,我们使用pycocotools
中的评估脚本来评估模型。 对于 Windows,请使用以下命令,从gautamchitnis
安装pycocotools
:
pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI
在Mask R-CNN 模型中,将label=0
作为背景类。 如果您的数据集不包含背景类,则labels
中不应包含0
。 例如,假设您只有猫和狗两类,则可以定义1
来表示猫,0
表示狗。 (labels
张量应类似于[1,2]
)
此外,如果要在训练过程中使用宽高比分组(以便每个批量仅包含具有相似长宽比的图像),则建议您还实现get_height_and_width
方法,该方法返回图像的高度和宽度。 如果未提供此方法,我们将通过__getitem__
查询数据集的所有元素,这样也可以图像加载到内存中,但是比起定义了get_height_and_width
方法,加载会更慢。
下载并解压缩 zip 文件之后,得到如下的文件夹结构:
PennFudanPed/
PedMasks/
FudanPed00001_mask.png
FudanPed00002_mask.png
FudanPed00003_mask.png
FudanPed00004_mask.png
...
PNGimg/
FudanPed00001.png
FudanPed00002.png
FudanPed00003.png
FudanPed00004.png
下面是一对图像和分割模板的一个示例
因此,每个图像都有一个对应的segmentation mask(分割掩模),其中每个颜色对应一个不同的实例。 下面开始给数据集定义一个torch.utils.data.Dataset
类。
import os
import numpy as np
import torch
from PIL import Image
class PennFudanDataset(object):
def __init__(self, root, transforms):
self.root = root
self.transforms = transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
def __getitem__(self, idx):
# load images ad masks
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
img = Image.open(img_path).convert("RGB")
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance
# with 0 being background
mask = Image.open(mask_path)
# convert the PIL Image into a numpy array
mask = np.array(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
# split the color-encoded mask into a set
# of binary masks
masks = mask == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
# convert everything into a torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
masks = torch.as_tensor(masks, dtype=torch.uint8)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)
在本教程中,我们将使用基于 Faster R-CNN 的Mask R-CNN 模型。
Faster R-CNN原理可参考我的帖子《目标检测打卡营上:VOC/COCO数据集、评测指标&Faster R-CNN等两阶段检测算法》
我们可以从torchvision modelzoo
选择需要的模型,但是有两种情况,需要修改模型:
加载在coco
数据集上预训练的目标检测模型,在自己的数据集上进行微调:
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
# load a model pre-trained pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
# replace the classifier with a new one, that has
# num_classes which is user-defined
num_classes = 2 # 1 class (person) + background
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
关于
AnchorGenerator
生成器,可以参考我的帖子《Faster R-CNN源码解析1(Pytorch版)》4.1章节。ROI Pooling
层参考上面说的FasterRCNN原理。
import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
# 加载预训练的backbone用于分类,且只返回特征图
backbone = torchvision.models.mobilenet_v2(pretrained=True).features
# FasterRCNN模型需要知道backbone的输出的channel数
# 对于mobilenet_v2, 输出channel= 1280
backbone.out_channels = 1280
"""
1. FasterRCNN中,RPN模块用于生成Anchors。Anchor模板有5个不同size和3种不同宽高比(aspect ratios)。
2. AnchorGenerator类用于生成Anchors,传入的参数是tuple类型。虽然mobilenet_v2只输出一种特征图,
但是FPN类backbone,会输出几种不同尺度的特征图。每个特征图上负责生成不同size的Anchors模板
3. 所以AnchorGenerator的传入Anchors参数统一为元组类型
4. 比如在resnet50_fpn作为backbone的FasterRCNN中,会定义
anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)# 将元组(0.5, 1.0, 2.0)重复5次
rpn_anchor_generator = AnchorsGenerator(anchor_sizes, aspect_ratios)
"""
# backbone为mobilenet_v2时,简单理解就是一个特征图上,每个点生成15种不同尺寸的Anchors。
# backbone为FPN时,简单理解就是在5个特征图上,每个点生成3种不同aspect_ratios的Anchors。
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
"""
1. 不同尺度的预测特征层,输出的feature map大小不一样,无法输入后面的全连接层进行预测
ROI Pooling层 就是将大小不同的feature map 池化成大小相同的feature map,方便后续处理
2. 如果backbone 返回的是一个Tensor, featmap_names应该是[0]
3. backbone更多时候返回的是OrderedDict[Tensor](表示输出多尺度特征图)
我们需要使用featmap_names来选择使用哪个feature maps
4. output_size=7表示RoIAlign层(对roi_pooler层的改进)最后输出7*7大小的特征图
sampling_ratio表示在每个7*7区域内采样两个点,利用双线性插值计算每个点的输出。
两个点的均值作为区域的输出,所有有区域的输出为RoIAlign层的feature map输出
"""
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
output_size=7,
sampling_ratio=2)
# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone,
num_classes=2,
rpn_anchor_generator=anchor_generator,
box_roi_pool=roi_pooler)
按这样看,修改模型的backbone
时还需要定义backbone输出channel
,anchor
生成器,roi_pooler
层和最后的model
结构。
FastRCNNPredictor
代码:FastRCNN中,
roi_heads
定义了box_roi_pool
,box_head
,box_predictor
三个部分和一些其它参数。roi_heads = RoIHeads( box_roi_pool, box_head, box_predictor,*kwargs)
class FastRCNNPredictor(nn.Module):
"""
Standard classification + bounding box regression layers
for Fast R-CNN.
Arguments:
in_channels (int): number of input channels
num_classes (int): number of output classes (including background)
"""
def __init__(self, in_channels, num_classes):
super(FastRCNNPredictor, self).__init__()
self.cls_score = nn.Linear(in_channels, num_classes)
self.bbox_pred = nn.Linear(in_channels, num_classes * 4)
def forward(self, x):
if x.dim() == 4:
assert list(x.shape[2:]) == [1, 1]
x = x.flatten(start_dim=1)
scores = self.cls_score(x)
bbox_deltas = self.bbox_pred(x)
return scores, bbox_deltas
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
def get_model_instance_segmentation(num_classes):
# 加载在COCO数据集上预训练好的实例分割模型
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
# 得到预训练模型的FastRCNNPredictor层输入通道数
in_features = model.roi_heads.box_predictor.cls_score.in_features
# 重写一个新的FastRCNNPredictor,主要是输出类别数变了
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# 得到mask classifier的输入通道数
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256
# 重写一个新的mask predictor
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
hidden_layer,
num_classes)
return model
在references/detection/
中,我们提供了许多帮助程序功能来简化训练和评估检测模型。 在这里,我们将使用references/detection/engine.py
,references/detection/utils.py
和references/detection/transforms.py
。 只需将它们复制到您的文件夹中,然后在此处使用它们即可。
下面定义数据增强和转换函数(data augmentation / transformation):
import transforms as T
# 训练时随机水平翻转图像
def get_transform(train):
transforms = []
transforms.append(T.ToTensor())
if train:
transforms.append(T.RandomHorizontalFlip(0.5))
return T.Compose(transforms)
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=2, shuffle=True, num_workers=4,
collate_fn=utils.collate_fn)
# 训练
images,targets = next(iter(data_loader))
images = list(image for image in images)
targets = [{k: v for k, v in t.items()} for t in targets]
output = model(images,targets) # 返回losses和detections
# 推理
model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x) # 返回预测结果
main
函数进行训练和验证:from engine import train_one_epoch, evaluate
import utils
def main():
# train on the GPU or on the CPU, if a GPU is not available
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# our dataset has two classes only - background and person
num_classes = 2
# use our dataset and defined transformations
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False))
# split the dataset in train and test set
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-50])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=2, shuffle=True, num_workers=4,
collate_fn=utils.collate_fn)
data_loader_test = torch.utils.data.DataLoader(
dataset_test, batch_size=1, shuffle=False, num_workers=4,
collate_fn=utils.collate_fn)
# 加载前面定义的模型
model = get_model_instance_segmentation(num_classes)
# move model to the right device
model.to(device)
# 定义优化器和学习率调度器
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
momentum=0.9, weight_decay=0.0005)
# and a learning rate scheduler
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.1)
# 训练10个epoch
num_epochs = 10
for epoch in range(num_epochs):
# 每个epoch中,每隔10个steps打印一次结果
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
# 更新学习率
lr_scheduler.step()
# 验证集评估
evaluate(model, data_loader_test, device=device)
print("That's it!")
epoch=0的输出:
Epoch: [0] [ 0/60] eta: 0:01:18 lr: 0.000090 loss: 2.5213 (2.5213) loss_classifier: 0.8025 (0.8025) loss_box_reg: 0.2634 (0.2634) loss_mask: 1.4265 (1.4265) loss_objectness: 0.0190 (0.0190) loss_rpn_box_reg: 0.0099 (0.0099) time: 1.3121 data: 0.3024 max mem: 3485
Epoch: [0] [10/60] eta: 0:00:20 lr: 0.000936 loss: 1.3007 (1.5313) loss_classifier: 0.3979 (0.4719) loss_box_reg: 0.2454 (0.2272) loss_mask: 0.6089 (0.7953) loss_objectness: 0.0197 (0.0228) loss_rpn_box_reg: 0.0121 (0.0141) time: 0.4198 data: 0.0298 max mem: 5081
Epoch: [0] [20/60] eta: 0:00:15 lr: 0.001783 loss: 0.7567 (1.1056) loss_classifier: 0.2221 (0.3319) loss_box_reg: 0.2002 (0.2106) loss_mask: 0.2904 (0.5332) loss_objectness: 0.0146 (0.0176) loss_rpn_box_reg: 0.0094 (0.0123) time: 0.3293 data: 0.0035 max mem: 5081
Epoch: [0] [30/60] eta: 0:00:11 lr: 0.002629 loss: 0.4705 (0.8935) loss_classifier: 0.0991 (0.2517) loss_box_reg: 0.1578 (0.1957) loss_mask: 0.1970 (0.4204) loss_objectness: 0.0061 (0.0140) loss_rpn_box_reg: 0.0075 (0.0118) time: 0.3403 data: 0.0044 max mem: 5081
Epoch: [0] [40/60] eta: 0:00:07 lr: 0.003476 loss: 0.3901 (0.7568) loss_classifier: 0.0648 (0.2022) loss_box_reg: 0.1207 (0.1736) loss_mask: 0.1705 (0.3585) loss_objectness: 0.0018 (0.0113) loss_rpn_box_reg: 0.0075 (0.0112) time: 0.3407 data: 0.0044 max mem: 5081
Epoch: [0] [50/60] eta: 0:00:03 lr: 0.004323 loss: 0.3237 (0.6703) loss_classifier: 0.0474 (0.1731) loss_box_reg: 0.1109 (0.1561) loss_mask: 0.1658 (0.3201) loss_objectness: 0.0015 (0.0093) loss_rpn_box_reg: 0.0093 (0.0116) time: 0.3379 data: 0.0043 max mem: 5081
Epoch: [0] [59/60] eta: 0:00:00 lr: 0.005000 loss: 0.2540 (0.6082) loss_classifier: 0.0309 (0.1526) loss_box_reg: 0.0463 (0.1405) loss_mask: 0.1568 (0.2945) loss_objectness: 0.0012 (0.0083) loss_rpn_box_reg: 0.0093 (0.0123) time: 0.3489 data: 0.0042 max mem: 5081
Epoch: [0] Total time: 0:00:21 (0.3570 s / it)
creating index...
index created!
Test: [ 0/50] eta: 0:00:19 model_time: 0.2152 (0.2152) evaluator_time: 0.0133 (0.0133) time: 0.4000 data: 0.1701 max mem: 5081
Test: [49/50] eta: 0:00:00 model_time: 0.0628 (0.0687) evaluator_time: 0.0039 (0.0064) time: 0.0735 data: 0.0022 max mem: 5081
Test: Total time: 0:00:04 (0.0828 s / it)
Averaged stats: model_time: 0.0628 (0.0687) evaluator_time: 0.0039 (0.0064)
Accumulating evaluation results...
DONE (t=0.01s).
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.606
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.984
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.780
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.313
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.582
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.612
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.270
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.672
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.672
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.650
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.755
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.664
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.704
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.979
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.871
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.325
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.488
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.727
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.316
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.748
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.749
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.650
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.673
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.758
经过一轮训练, COCO-style mAP为60.6, mask mAP为70.4。经过 10 轮训练,我得到了以下指标:
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.799
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.969
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.935
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.349
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.592
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.831
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.324
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.844
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.844
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.777
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.870
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.761
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.969
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.919
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.341
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.464
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.788
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.303
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.799
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.799
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.769
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.818
选择一张图片进行看看预测结果:
经过训练的模型会在此图片中预测 9 个人物实例,让我们看看其中的几个:
在本教程中,您学习了如何在自定义数据集上进行实例分割。
torch.utils.data.Dataset
类,返回图像、真实框、分割掩模torchvision
上的references/detection/train.py
。Built with Sphinx using a theme provided by Read the Docs.
参考官网教程《Transfer Learning for Computer Vision Tutorial》、深度学习译文集之PyTorch篇《计算机视觉的迁移学习教程》
在本教程中,您将学习如何使用迁移学习,训练卷积神经网络进行图像分类。 您可以在 cs231n 笔记中阅读有关迁移学习的更多信息。
实际上,很少有人从头开始训练整个卷积网络(使用随机初始化),因为拥有足够大小的数据集相对很少。 相反,通常在非常大的数据集上对 ConvNet 进行预训练(例如 ImageNet,其中包含 120 万个具有 1000 个类别的图像),然后将 ConvNet 用作初始化,或固定特征提取器以完成感兴趣的任务。
迁移学习有两种方式:
ImageNet 1000
数据集上进行训练的网络。 其余的训练照常进行。# License: BSD
# Author: Sasank Chilamkurthy
from __future__ import print_function, division
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
plt.ion() # interactive mode
使用
torchvision
和torch.utils.data
包来加载数据。
我们今天要解决的问题是训练一个模型来对蚂蚁和蜜蜂进行分类。 每个类别各有约120 张训练图像,75张验证图像。如果从头开始训练的话,这个数据集太小了。 但是使用迁移学习,效果会不错。
该数据集是 ImageNet 的很小一部分。从这里下载数据并将其解压缩到当前目录。
# 训练集使用数据增强和normalization,验证集只使用normalization
from torch.utils.data import DataLoader
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
data_dir = 'data/hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),data_transforms[x])
for x in ['train', 'val']}
dataloaders = {x: DataLoader(image_datasets[x], batch_size=4,shuffle=True, num_workers=4)
for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
让我们可视化一些训练图像,以了解数据增强。
def imshow(inp, title=None):
"""Imshow for Tensor."""
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp)
if title is not None:
plt.title(title)
plt.pause(0.001) # 暂停一下,以便plots更新图像
# 加载一个batch的数据
inputs, classes = next(iter(dataloaders['train']))
# Make a grid from batch
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
since = time.time() # 获取当前系统时间
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# 每个epoch都有训练阶段和验证阶段
for phase in ['train', 'val']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# 遍历整个数据集
for inputs, labels in dataloaders[phase]:
inputs,labels = inputs.to(device),labels.to(device)
# 梯度清零
optimizer.zero_grad()
# 前向传播,仅在训练时计算梯度
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# 只在训练阶段反向传播并更新optimizer
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
if phase == 'train':
scheduler.step()
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# 保存最优模型参数
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# 训练完后,加载最优模型
model.load_state_dict(best_model_wts)
return model
定义visualize_model
函数,显示预测结果。
torch中,train,eval实际对是
model.training
的控制。模型在默认情况下是train模型,model.training=True
。model.eval()时,model.training=False
。
def visualize_model(model, num_images=6):
was_training = model.training
model.eval()
images_so_far = 0
fig = plt.figure()
with torch.no_grad():
for i, (inputs, labels) in enumerate(dataloaders['val']):
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
for j in range(inputs.size()[0]):
images_so_far += 1
ax = plt.subplot(num_img//2, 2, images_so_far)
ax.axis('off')
ax.set_title('predicted: {}'.format(class_names[preds[j]]))
imshow(inputs.cpu().data[j])
if images_so_far == num_images:
model.train(mode=was_training)
return
model.train(mode=was_training)
由于数据集有两个类别,所以在加载预训练模型后(ImageNet有1000类),需要将分类任务的类别数改为2。
model = models.resnet18(pretrained=True) # 也就是torchvision.models.resnet18
num_ftrs = model.fc.in_features # 获取预训练模型的全连接层的输入channel数
model.fc = nn.Linear(num_ftrs, 2) # 将全连接层输出channel设为2,也就是将模型设为二分类模型。
model = model.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
# 每7个epoch将LR衰减到原来的0.1倍
lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
model # 打印模型
在 CPU 上大约需要 15-25 分钟。 但是在 GPU 上,此过程不到一分钟。
model = train_model(model, criterion, optimizer, lr_scheduler,num_epochs=25)
Epoch 0/24
----------
train Loss: 0.6303 Acc: 0.6926
val Loss: 0.1492 Acc: 0.9346
Epoch 1/24
----------
train Loss: 0.5511 Acc: 0.7869
val Loss: 0.2577 Acc: 0.8889
Epoch 2/24
----------
train Loss: 0.4885 Acc: 0.8115
val Loss: 0.3390 Acc: 0.8758
Epoch 3/24
----------
train Loss: 0.5158 Acc: 0.7992
val Loss: 0.5070 Acc: 0.8366
Epoch 4/24
----------
train Loss: 0.5878 Acc: 0.7992
val Loss: 0.2706 Acc: 0.8758
Epoch 5/24
----------
train Loss: 0.4396 Acc: 0.8279
val Loss: 0.2870 Acc: 0.8954
Epoch 6/24
----------
train Loss: 0.4612 Acc: 0.8238
val Loss: 0.2809 Acc: 0.9150
Epoch 7/24
----------
train Loss: 0.4387 Acc: 0.8402
val Loss: 0.1853 Acc: 0.9281
Epoch 8/24
----------
train Loss: 0.2998 Acc: 0.8648
val Loss: 0.1926 Acc: 0.9085
Epoch 9/24
----------
train Loss: 0.3383 Acc: 0.9016
val Loss: 0.1762 Acc: 0.9281
Epoch 10/24
----------
train Loss: 0.2969 Acc: 0.8730
val Loss: 0.1872 Acc: 0.8954
Epoch 11/24
----------
train Loss: 0.3117 Acc: 0.8811
val Loss: 0.1807 Acc: 0.9150
Epoch 12/24
----------
train Loss: 0.3005 Acc: 0.8770
val Loss: 0.1930 Acc: 0.9085
Epoch 13/24
----------
train Loss: 0.3129 Acc: 0.8689
val Loss: 0.2184 Acc: 0.9150
Epoch 14/24
----------
train Loss: 0.3776 Acc: 0.8607
val Loss: 0.1869 Acc: 0.9216
Epoch 15/24
----------
train Loss: 0.2245 Acc: 0.9016
val Loss: 0.1742 Acc: 0.9346
Epoch 16/24
----------
train Loss: 0.3105 Acc: 0.8607
val Loss: 0.2056 Acc: 0.9216
Epoch 17/24
----------
train Loss: 0.2729 Acc: 0.8893
val Loss: 0.1722 Acc: 0.9085
Epoch 18/24
----------
train Loss: 0.3210 Acc: 0.8730
val Loss: 0.1977 Acc: 0.9281
Epoch 19/24
----------
train Loss: 0.3231 Acc: 0.8566
val Loss: 0.1811 Acc: 0.9216
Epoch 20/24
----------
train Loss: 0.3206 Acc: 0.8648
val Loss: 0.2033 Acc: 0.9150
Epoch 21/24
----------
train Loss: 0.2917 Acc: 0.8648
val Loss: 0.1694 Acc: 0.9150
Epoch 22/24
----------
train Loss: 0.2412 Acc: 0.8852
val Loss: 0.1757 Acc: 0.9216
Epoch 23/24
----------
train Loss: 0.2508 Acc: 0.8975
val Loss: 0.1662 Acc: 0.9281
Epoch 24/24
----------
train Loss: 0.3283 Acc: 0.8566
val Loss: 0.1761 Acc: 0.9281
Training complete in 1m 10s
Best val Acc: 0.934641
visualize_model(model_ft)
上面说过,ConvNet 作为特征提取器时,需要冻结除最后一层之外的所有网络。 方法是设置requires_grad == False
冻结参数,以便在backward()
反向传播中不计算梯度。
关于梯度计算,您可以在《Autograd mechanics文档》中阅读有关此内容的更多信息。
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
param.requires_grad = False
# 默认情况下,新构建的module参数为requires_grad=True
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
# 只有最后一层的参数会被optimizer优化
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
# 每7个epoch将LR衰减到原来的0.1倍
lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
冻结某几层参考参考《pytorch冻结模型中某几层的参数》
与以前的方案相比,在 CPU 上将花费大约一半的时间。 这是可以预期的,因为不需要为大多数网络计算梯度。 但是,确实需要计算正向。
model_conv = train_model(model_conv, criterion, optimizer_conv,
lr_scheduler, num_epochs=25)
Epoch 0/24
----------
train Loss: 0.7258 Acc: 0.6148
val Loss: 0.2690 Acc: 0.9020
Epoch 1/24
----------
train Loss: 0.5342 Acc: 0.7500
val Loss: 0.1905 Acc: 0.9412
Epoch 2/24
----------
train Loss: 0.4262 Acc: 0.8320
val Loss: 0.1903 Acc: 0.9412
Epoch 3/24
----------
train Loss: 0.4103 Acc: 0.8197
val Loss: 0.2658 Acc: 0.8954
Epoch 4/24
----------
train Loss: 0.3938 Acc: 0.8115
val Loss: 0.2871 Acc: 0.8954
Epoch 5/24
----------
train Loss: 0.4623 Acc: 0.8361
val Loss: 0.1651 Acc: 0.9346
Epoch 6/24
----------
train Loss: 0.5348 Acc: 0.7869
val Loss: 0.1944 Acc: 0.9477
Epoch 7/24
----------
train Loss: 0.3827 Acc: 0.8402
val Loss: 0.1846 Acc: 0.9412
Epoch 8/24
----------
train Loss: 0.3655 Acc: 0.8443
val Loss: 0.1873 Acc: 0.9412
Epoch 9/24
----------
train Loss: 0.3275 Acc: 0.8525
val Loss: 0.2091 Acc: 0.9412
Epoch 10/24
----------
train Loss: 0.3375 Acc: 0.8320
val Loss: 0.1798 Acc: 0.9412
Epoch 11/24
----------
train Loss: 0.3077 Acc: 0.8648
val Loss: 0.1942 Acc: 0.9346
Epoch 12/24
----------
train Loss: 0.4336 Acc: 0.7787
val Loss: 0.1934 Acc: 0.9346
Epoch 13/24
----------
train Loss: 0.3149 Acc: 0.8566
val Loss: 0.2062 Acc: 0.9281
Epoch 14/24
----------
train Loss: 0.3617 Acc: 0.8320
val Loss: 0.1761 Acc: 0.9412
Epoch 15/24
----------
train Loss: 0.3066 Acc: 0.8361
val Loss: 0.1799 Acc: 0.9281
Epoch 16/24
----------
train Loss: 0.3952 Acc: 0.8443
val Loss: 0.1666 Acc: 0.9346
Epoch 17/24
----------
train Loss: 0.3552 Acc: 0.8443
val Loss: 0.1928 Acc: 0.9412
Epoch 18/24
----------
train Loss: 0.3106 Acc: 0.8648
val Loss: 0.1964 Acc: 0.9346
Epoch 19/24
----------
train Loss: 0.3675 Acc: 0.8566
val Loss: 0.1813 Acc: 0.9346
Epoch 20/24
----------
train Loss: 0.3565 Acc: 0.8320
val Loss: 0.1758 Acc: 0.9346
Epoch 21/24
----------
train Loss: 0.2922 Acc: 0.8566
val Loss: 0.2295 Acc: 0.9216
Epoch 22/24
----------
train Loss: 0.3283 Acc: 0.8402
val Loss: 0.2267 Acc: 0.9281
Epoch 23/24
----------
train Loss: 0.2875 Acc: 0.8770
val Loss: 0.1878 Acc: 0.9346
Epoch 24/24
----------
train Loss: 0.3172 Acc: 0.8689
val Loss: 0.1849 Acc: 0.9412
Training complete in 0m 34s
Best val Acc: 0.947712
visualize_model(model_conv)
plt.ioff()
plt.show()
如果您想了解有关迁移学习的更多信息,请查看官网教程Model Optimization
部分的计算机视觉教程的量化迁移学习。
脚本的总运行时间:(1 分钟 56.157 秒)
下载 Python 源码:transfer_learning_tutorial.py
下载 Jupyter 笔记本:transfer_learning_tutorial.ipynb
参考《Pytorch:找到对应的层,并在不同层设置不同的学习率》
《CLUENER 细粒度命名实体识别,附完整代码》5.2 定义优化器
参考:
- 官网教程《Adversarial Example Generation》、深度学习译文集之PyTorch篇《对抗示例生成》
- FGSM原理参考《图像对抗算法-攻击篇(FGSM)》、论文解读《EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES 论文笔记》
本教程将提高您对 ML 模型的安全漏洞的认识,并深入了解对抗性机器学习的热门话题。 您可能会惊讶地发现,在图像上添加无法察觉的扰动会导致完全不同的模型表现。 我们将通过图像分类器上的示例来探讨该主题。 具体而言,我们将使用最流行的一种攻击方法,即(FGSM)来欺骗 MNIST 分类器。
就上下文而言,有多种类型的对抗性攻击,每个攻击都有不同的目标和对攻击者知识的假设。 但是,总的来说,总体目标是向输入数据添加最少的扰动,以引起所需的错误分类。
攻击者的知识有几种假设,其中两种是:白盒和黑盒。
目标也有几种类型,包括错误分类和源/目标错误分类。
在这种情况下,FGSM 攻击是白盒攻击,目标是错误分类。 (白盒攻击相对于黑盒攻击简单一些,pytorch给出的样例也是基于白盒攻击)
FGSM是由GollowFellow于2015年提出的一种经典的白盒攻击算法。思路清晰易懂,效果直观。它旨在利用神经网络学习梯度的方式来攻击神经网络。攻击会基于梯度来调整输入数据,以使损失最大化, 而不是通过梯度来调整权重来使损失最小化。
具体说就是:
ᐁ[x] J(θ, x, y)
。其中J()是损失函数,x和y表示输入图像和真实标签,θ表示网络参数。signᐁ[x] J(θ, x, y)
。
sign()函数是用来求数值符号的函数,对于大于0的输入,输出为1, 对于小于0的输入,输出为-1,对于等于0的输入,输出为0。
之所以采用梯度方向而不是采用梯度值是为了控制扰动的L∞距离,这是FGSM算法的评价指标。
FGSM核心思想就是Figure1所示的内容。Figure1中左边图是常规的图像,一般的分类模型都会将其分类为熊猫(panda),但是通过添加由网络梯度生成的攻击噪声后,得到右边的攻击图像,虽然看起来还是熊猫,但是模型却将其分类为长臂猿(gibbon)。
下面进入实现过程。我们将讨论本教程的输入参数,定义受到攻击的模型,然后编写攻击代码并运行一些测试。
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import numpy as np
import matplotlib.pyplot as plt
本教程只有三个输入,定义如下:
epsilons
:ε
值列表。 列表中 0表示原始测试集上的模型表现。直观上,ε
越大,扰动越明显,攻击越有效。 由于输入数据的范围为[0,1]
,因此ε
值不得超过 1。pretrained_model
-使用pytorch/examples/mnist
训练的 MNIST 模型的路径。 为简单起见,请在此处下载预训练模型。use_cuda
-布尔标志,是否使用 CUDA。epsilons = [0, .05, .1, .15, .2, .25, .3]
pretrained_model = "data/lenet_mnist_model.pth"
use_cuda=True
本部分的目的是定义模型和数据加载器,然后初始化模型并加载预训练的权重。
# LeNet Model definition
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
# MNIST Test dataset and dataloader declaration
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False, download=True, transform=transforms.Compose([
transforms.ToTensor(),
])),
batch_size=16, shuffle=True)
# Define what device we are using
print("CUDA Available: ",torch.cuda.is_available())
device = torch.device("cuda" if (use_cuda and torch.cuda.is_available()) else "cpu")
# Initialize the network
model = Net().to(device)
# Load the pretrained model
model.load_state_dict(torch.load(pretrained_model, map_location='cpu'))
# Set the model in evaluation mode. In this case this is for the Dropout layers
model.eval()
现在,我们可以创建一个函数,通过干扰原始输入来创建对抗示例。函数 fgsm_attack
函数接受三个输入:
image
:原始图像(x
)epsilon
:像素级扰动量ε
data_grad
:输入图像损失的梯度(ᐁ[x] J(θ, x, y)
)。函数然后创建扰动图像为:
p e r t u r b e d _ i m a g e = i m a g e + e p s i l o n ∗ s i g n ( d a t a _ g r a d ) = x + ϵ ∗ s i g n ( ∇ x J ( θ , x , y ) ) perturbed\_image = image + epsilon*sign(data\_grad) = x + \epsilon * sign(\nabla_{x} J(\mathbf{\theta}, \mathbf{x}, y)) perturbed_image=image+epsilon∗sign(data_grad)=x+ϵ∗sign(∇xJ(θ,x,y))
最后,为了维持数据的原始范围,将被扰动的图像裁剪到范围[0,1]
。
# FGSM attack code
def fgsm_attack(image, epsilon, data_grad):
# 得到梯度符号
sign_data_grad = data_grad.sign()
# 通过调整输入图像的每个像素值来创建扰动图像
perturbed_image = image + epsilon*sign_data_grad
# 剪裁至范围[0,1]
perturbed_image = torch.clamp(perturbed_image, 0, 1)
# 返回扰动图像
return perturbed_image
本教程的主要结果来自test
函数。 每次调用此测试函数,都会在 MNIST 测试集上执行完整的测试步骤,并报告最终精度。
test函数还需要epsilon
输入。 这是因为test
函数报告受ε
攻击的模型的准确率。
更具体地说,对于测试集中的每个样本,函数都会计算输入数据data_grad
的损失梯度,并使用fgsm_attack
创建一个扰动图像perturbed_data
,然后检查受干扰的示例是否具有对抗性。
除了测试模型的准确率外,该函数还保存并返回了一些成功的对抗示例,以供后续的可视化。
def test( model, device, test_loader, epsilon ):
# Accuracy counter
correct = 0
adv_examples = []
for data, target in test_loader:
data, target = data.to(device), target.to(device)
# Set requires_grad attribute of tensor. Important for Attack
data.requires_grad = True
output = model(data)
init_pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
# 如果最初的预测是错误的,不需要攻击
if init_pred.item() != target.item():
continue
loss = F.nll_loss(output, target)
model.zero_grad() # 梯度清零
loss.backward() # 梯度回传
# Collect datagrad
data_grad = data.grad.data
# FGSM Attack,根据梯度添加扰动
perturbed_data = fgsm_attack(data, epsilon, data_grad)
# 再次分类扰动图像
output = model(perturbed_data)
# 检查攻击是否成功
final_pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
if final_pred.item() == target.item():
correct += 1
# Special case for saving 0 epsilon examples
if (epsilon == 0) and (len(adv_examples) < 5):
adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
adv_examples.append( (init_pred.item(), final_pred.item(), adv_ex) )
else:
# 保存一些adv示例,以便后面可视化
if len(adv_examples) < 5:
adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
adv_examples.append( (init_pred.item(), final_pred.item(), adv_ex) )
# 计算此ε下的的最终acc
final_acc = correct/float(len(test_loader))
print("Epsilon: {}\tTest Accuracy = {} / {} = {}".format(epsilon, correct, len(test_loader), final_acc))
# 返回accuracy和对抗示例
return final_acc, adv_examples
最后,运行测试步骤,遍历epsilon
列表中的每个ε
值。得到最终精度,并绘制一些成功的对抗示例。 注意ε = 0
表示原始测试准确率,模型没有受到攻击。
accuracies = []
examples = []
# Run test for each epsilon
for eps in epsilons:
acc, ex = test(model, device, test_loader, eps)
accuracies.append(acc)
examples.append(ex)
输出:
Epsilon: 0 Test Accuracy = 9810 / 10000 = 0.981
Epsilon: 0.05 Test Accuracy = 9426 / 10000 = 0.9426
Epsilon: 0.1 Test Accuracy = 8510 / 10000 = 0.851
Epsilon: 0.15 Test Accuracy = 6826 / 10000 = 0.6826
Epsilon: 0.2 Test Accuracy = 4301 / 10000 = 0.4301
Epsilon: 0.25 Test Accuracy = 2082 / 10000 = 0.2082
Epsilon: 0.3 Test Accuracy = 869 / 10000 = 0.0869
ε
下图是精度与ε
曲线的关系,随着ε
的增加,精度会降低。 但是二者并不是线性相关。例如,ε = 0.05
处的精度仅比ε = 0
低约 4%,但ε = 0.2
处的精度比ε = 0.15
。 另外,请注意,模型的原始准确率在ε = 0.25
和ε = 0.3
之间。
plt.figure(figsize=(5,5))
plt.plot(epsilons, accuracies, "*-")
plt.yticks(np.arange(0, 1.1, step=0.1))
plt.xticks(np.arange(0, .35, step=0.05))
plt.title("Accuracy vs Epsilon")
plt.xlabel("Epsilon")
plt.ylabel("Accuracy")
plt.show()
ε
的增加,测试精度降低,但扰动变得更容易察觉。 所以,攻击者在准确率下降和可感知性之间要进行权衡。ε
值下成功对抗示例的一些示例。 绘图的每一行显示不同的ε
值。 第一行是ε = 0
示例,这些示例表示没有干扰的原始“干净”图像。 每张图片的标题均显示“原始分类->对抗分类”。ε = 0.15
处开始变得明显,而在ε = 0.3
处则非常明显。 不过,尽管增加了噪音,人类仍然能够识别正确的类别。# Plot several examples of adversarial samples at each epsilon
cnt = 0
plt.figure(figsize=(8,10))
for i in range(len(epsilons)):
for j in range(len(examples[i])):
cnt += 1
plt.subplot(len(epsilons),len(examples[0]),cnt)
plt.xticks([], [])
plt.yticks([], [])
if j == 0:
plt.ylabel("Eps: {}".format(epsilons[i]), fontsize=14)
orig,adv,ex = examples[i][j]
plt.title("{} -> {}".format(orig, adv))
plt.imshow(ex, cmap="gray")
plt.tight_layout()
plt.show()
在 NIPS 2017 上有一个对抗性的攻击和防御竞赛,并且本文描述了该竞赛中使用的许多方法:《对抗性的攻击与防御竞赛》。 防御方面的工作还引发了使机器学习模型总体上更健壮的想法,以适应自然扰动和对抗性输入。
另一个方向是不同领域的对抗性攻击和防御。 对抗性研究不仅限于图像领域,请查看对语音到文本模型的这种攻击。尝试实现与 NIPS 2017 竞赛不同的攻击,并查看它与 FGSM 有何不同。 然后,尝试保护模型免受自己的攻击。
脚本的总运行时间:(4 分钟 22.519 秒)
下载 Python 源码:fgsm_tutorial.py
下载 Jupyter 笔记本:fgsm_tutorial.ipynb