机器学习HW10对抗性攻击

机器学习HW10对抗性攻击

  • 一、任务描述
  • 二、算法
    • 1、FGSM
    • 2、I-FGSM
    • 3、MI-FGSM
    • 4、多种输入(DIM)
      • 评估指标
  • 三、实验
    • 1、Simple Baseline
    • 2、Medium Baseline
    • 3、Strong Baseline
    • 4、Boss Baseline


一、任务描述

我们使用pytorchcv来获得CIFAR-10预训练模型,所以我们需要首先设置环境。我们还需要下载我们想要攻击的数据(200张图片)。我们将首先通过ToTensor对原始像素值(0-255标度)应用变换(到0-1标度),然后归一化(减去均值除以标准差)。
机器学习HW10对抗性攻击_第1张图片

# the mean and std are the calculated statistics from cifar_10 dataset
cifar_10_mean = (0.491, 0.482, 0.447) # mean for the three channels of cifar_10 images
cifar_10_std = (0.202, 0.199, 0.201) # std for the three channels of cifar_10 images

# convert mean and std to 3-dimensional tensors for future operations
mean = torch.tensor(cifar_10_mean).to(device).view(3, 1, 1)
std = torch.tensor(cifar_10_std).to(device).view(3, 1, 1)

epsilon = 8/255/std

ToTensor()还会将图像从形状(高度、宽度、通道)转换为形状(通道、高度、宽度),所以我们还需要将形状转置回原始形状。
由于我们的数据加载器对一批数据进行采样,所以我们这里需要使用np.transpose将(batch_size,channel,height,width)转置回(batch_size,height,width,channel)。

# perform adversarial attack and generate adversarial examples
def gen_adv_examples(model, loader, attack, loss_fn):
    model.eval()
    adv_names = []
    train_acc, train_loss = 0.0, 0.0
    for i, (x, y) in enumerate(loader):
        x, y = x.to(device), y.to(device)
        x_adv = attack(model, x, y, loss_fn) # obtain adversarial examples
        yp = model(x_adv)
        loss = loss_fn(yp, y)
        train_acc += (yp.argmax(dim=1) == y).sum().item()
        train_loss += loss.item() * x.shape[0]
        # store adversarial examples
        adv_ex = ((x_adv) * std + mean).clamp(0, 1) # to 0-1 scale
        adv_ex = (adv_ex * 255).clamp(0, 255) # 0-255 scale
        adv_ex = adv_ex.detach().cpu().data.numpy().round() # 四舍五入以去除小数部分
        adv_ex = adv_ex.transpose((0, 2, 3, 1)) # transpose (bs, C, H, W) back to (bs, H, W, C)
        adv_examples = adv_ex if i == 0 else np.r_[adv_examples, adv_ex]
    return adv_examples, train_acc / len(loader.dataset), train_loss / len(loader.dataset)

集成多个模型作为您的代理模型,以增加黑盒可移植性

class ensembleNet(nn.Module):
    def __init__(self, model_names):
        super().__init__()
        self.models = nn.ModuleList([ptcv_get_model(name, pretrained=True) for name in model_names])
        self.softmax = nn.Softmax(dim=1)
    def forward(self, x):
        for i, m in enumerate(self.models):
        # TODO: sum up logits from multiple models  
        # return ensemble_logits
            emsemble_logits = m(x) if i == 0 else emsemble_logits + m(x)
        return emsemble_logits/len(self.models)

先决条件
○攻击目标:非目标攻击
○攻击约束:L-infinity和参数ε
○攻击算法: FGSM/I-FGSM
○攻击模式:黑盒攻击(对代理网络执行攻击)
机器学习HW10对抗性攻击_第2张图片
机器学习HW10对抗性攻击_第3张图片
1.选择任意一个代理网络来攻击TA 中的黑盒模型。
2.实现非目标对抗性攻击方法
a.FGSM
b.I- FGSM
c.MI-FGSM
3.通过不同的输入(DIM)来增加攻击的可转移性
4.攻击多个代理模型-集成攻击(Ensemble attack)

二、算法

1、FGSM

●快速梯度符号法Fast Gradient Sign Method(FGSM)
FGSM是一种基于梯度生成对抗样本的算法,属于对抗攻击中的无目标攻击(即不要求对抗样本经过model预测指定的类别,只要与原样本预测的不一样即可)。我们在理解简单的dp网络结构的时候,在求损失函数最小值,我们会沿着梯度的反方向移动,使用减号,也就是所谓的梯度下降算法;而FGSM可以理解为梯度上升算法,也就是使用加号,使得损失函数最大化。SIGN 函数用于返回数字的符号。当数字大于 0 时返回 1,等于 0 时返回 0,小于0 时返回 -1。 x 是原始样本,θ 是模型的权重参数(即w),y是x的真实类别。输入原始样本,权重参数以及真实类别,通过 J 损失函数求得神经网络的损失值,∇x 表示对 x 求偏导,即损失函数 J 对 x 样本求偏导。ϵ(epsilon)的值通常是人为设定 ,可以理解为学习率,一旦扰动值超出阈值,该对抗样本会被人眼识别。机器学习HW10对抗性攻击_第4张图片

# perform fgsm attack
def fgsm(model, x, y, loss_fn, epsilon=epsilon):
    x_adv = x.detach().clone() # initialize x_adv as original benign image x
    x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
    loss = loss_fn(model(x_adv), y) # calculate loss
    loss.backward() # calculate gradient
    # fgsm: use gradient ascent on x_adv to maximize loss
    grad = x_adv.grad.detach()
    x_adv = x_adv + epsilon * grad.sign()
    return x_adv

2、I-FGSM

● Iterative Fast Gradient Sign Method (I-FGSM)
机器学习HW10对抗性攻击_第5张图片
机器学习HW10对抗性攻击_第6张图片

# alpha and num_iter can be decided by yourself
alpha = 0.8/255/std
def ifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=20):
    x_adv = x
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        # x_adv = fgsm(model, x_adv, y, loss_fn, alpha) # call fgsm with (epsilon = alpha) to obtain new x_adv
        x_adv = x_adv.detach().clone()
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) # calculate loss
        loss.backward() # calculate gradient
        # fgsm: use gradient ascent on x_adv to maximize loss
        grad = x_adv.grad.detach()
        x_adv = x_adv + alpha * grad.sign()

        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

3、MI-FGSM

用动量增强对抗性攻击,使用动量来稳定更新方向和逃避糟糕的局部最大值
机器学习HW10对抗性攻击_第7张图片

def mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=20, decay=1.0):
    x_adv = x
    # initialze momentum tensor
    momentum = torch.zeros_like(x).detach().to(device)
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        x_adv = x_adv.detach().clone()
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) # calculate loss
        loss.backward() # calculate gradient
        # TODO: Momentum calculation
        # grad = .....
        # TODO: Momentum calculation
        grad = x_adv.grad.detach()
        grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
        momentum = grad
        x_adv = x_adv + alpha * grad.sign()
        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

过拟合也发生在对抗性攻击中……
●IFGSM贪婪地干扰图像的方向符号,损失梯度容易落入贫穷的局部极大值和过拟合特定的网络参数。
●这些过拟合对抗的例子很少转移到黑盒模型。
如何防止过拟合代理模型,增加黑盒攻击的可转移性?数据增强

4、多种输入(DIM)

1.随机调整大小(将输入图像的大小调整为随机大小)
2.随机填充(以随机的方式在输入图像周围填充零)
机器学习HW10对抗性攻击_第8张图片

def dmi_mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=50, decay=1.0, p=0.5):
    x_adv = x
    # initialze momentum tensor
    momentum = torch.zeros_like(x).detach().to(device)
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        x_adv = x_adv.detach().clone()
        x_adv_raw = x_adv.clone()
        if torch.rand(1).item() >= p:
            #resize img to rnd X rnd
            rnd = torch.randint(29, 33, (1,)).item()
            x_adv = transforms.Resize((rnd, rnd))(x_adv)
            #padding img to 32 X 32 with 0
            left = torch.randint(0, 32 - rnd + 1, (1,)).item()
            top = torch.randint(0, 32 - rnd + 1, (1,)).item()
            right = 32 - rnd - left
            bottom = 32 - rnd - top
            x_adv = transforms.Pad([left, top, right, bottom])(x_adv)
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) # calculate loss
        loss.backward() # calculate gradient
        # TODO: Momentum calculation
        # grad = .....   
        grad = x_adv.grad.detach()
        grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
        momentum = grad
        x_adv = x_adv_raw + alpha * grad.sign()
        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

集成攻击
●选择代理模型的列表
●选择一个攻击算法(FGSM I-FGSM等等)
●攻击多个代理模型同时
●集成对抗攻击:研究可转移对抗的例子和黑盒攻击
●如何选择合适的代理模型黑盒攻击:

评估指标

●参数ε固定为8
●距离测量: L-inf. norm
●模型精度是唯一的评价指标
机器学习HW10对抗性攻击_第9张图片

三、实验

1、Simple Baseline

直接运行助教程式中的FGSM方法。FGSM只对图片进行一次攻击,代理模型(proxy models)是resnet110_cifar10,在被攻击图片中的精度benign_acc=0.95, benign_loss=0.22678。在攻击中,使用gen_adv_examples函数调用fgsm函数,精度降低:fgsm_acc=0.59, fgsm_loss=2.49186。
机器学习HW10对抗性攻击_第10张图片
机器学习HW10对抗性攻击_第11张图片

2、Medium Baseline

方法:I-FGSM方法 + Ensembel Attack。ifgsm方法相比与fgsm相比,使用了多次的fgsm循环攻击。另外使用了Ensemble attack,该方法使用多个代理模型攻击,需要改动ensembelNet这个类中的forward函数。在攻击中,使用gen_adv_examples函数调用emsebel_model和ifgsm,精度降低明显:ifgsm_acc = 0.01, ifgsm_loss=17.29498。

    def forward(self, x):
        emsemble_logits = None
        for i, m in enumerate(self.models):
            emsemble_logits = m(x) if i == 0 else emsemble_logits + m(x)
        return emsemble_logits/len(self.models)

机器学习HW10对抗性攻击_第12张图片
机器学习HW10对抗性攻击_第13张图片
源模型是“resnet110_cifar10”,对dog2.png应用普通fgsm攻击:
机器学习HW10对抗性攻击_第14张图片
通过imgaug包进行JPEG压缩,压缩率设置为70:
机器学习HW10对抗性攻击_第15张图片

3、Strong Baseline

方法:MIFGSM + Ensemble Attack(pick right models)。mifgsm相比于ifgsm,加入了momentum,避免攻击陷入local maxima。在medium baseline中,我们随机挑选了一些代理模型,这样很盲目。可以选择一些训练不充分的模型,训练不充分的意思包括两方面:一是模型的训练epoch少,二是模型在验证集(val set)未达到最小loss。依据论文使用https://github.com/kuangliu/pytorch-cifar中的训练方法,选择resnet18模型,训练30个epoch(正常训练到达最好结果大约需要200个epoch),将其加入ensmbleNet中。攻击后的精度和loss:ensemble_mifgsm_acc = 0.01, emsemble_mifgsm_loss = 12.13276。
机器学习HW10对抗性攻击_第16张图片

        # TODO: Momentum calculation
        grad = x_adv.grad.detach()
        grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
        momentum = grad
        x_adv = x_adv + alpha * grad.sign()

4、Boss Baseline

方法:DIM-MIFGSM + Ensemble Attack(pick right models)。相对于strong baseline,将mifgsm替换为dim-mifgsm,对被攻击图片加入了transform来避免过拟合。transform是先随机的resize图片,然后随机padding图片到原大小。在mifgsm函数的基础上写dim_mifgsm函数,在攻击中,使用gen_adv_examples函数调用模型,攻击后的精度和loss:ensemble_dmi_mifgsm_acc = 0.00, emsemble_dim_mifgsm_loss = 13.64031。​​​​​​​
机器学习HW10对抗性攻击_第17张图片

def dmi_mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=50, decay=1.0, p=0.5):
    x_adv = x
    # initialze momentum tensor
    momentum = torch.zeros_like(x).detach().to(device)
    # write a loop of num_iter to represent the iterative times
    for i in range(num_iter):
        x_adv = x_adv.detach().clone()
        x_adv_raw = x_adv.clone()
        if torch.rand(1).item() >= p:
            #resize img to rnd X rnd
            rnd = torch.randint(29, 33, (1,)).item()
            x_adv = transforms.Resize((rnd, rnd))(x_adv)
            #padding img to 32 X 32 with 0
            left = torch.randint(0, 32 - rnd + 1, (1,)).item()
            top = torch.randint(0, 32 - rnd + 1, (1,)).item()
            right = 32 - rnd - left
            bottom = 32 - rnd - top
            x_adv = transforms.Pad([left, top, right, bottom])(x_adv)
        x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
        loss = loss_fn(model(x_adv), y) # calculate loss
        loss.backward() # calculate gradient
        # TODO: Momentum calculation
        # grad = .....   
        grad = x_adv.grad.detach()
        grad = decay * momentum + grad/(grad.abs().sum() + 1e-8)
        momentum = grad
        x_adv = x_adv_raw + alpha * grad.sign()
        x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
    return x_adv

你可能感兴趣的:(深度学习,python)