Enhancing the Transferability of Adversarial Attacks through Variance Tuning论文解读

Abstract

尽管对抗攻击在白盒设定下取得了很高的成功率,但是绝大多数现存的adversaries 常常在黑盒设定下exhibit 弱迁移性,尤其是在攻击的模型存在防御机制的情况下。在这篇文章中,我们提出了一种新的称为variance tuning的方式来增强the class of iterative gradient based attack methods 并且提高了它们的攻击可迁移性。特别地,在每一轮梯度计算中,instead of 直接使用当前的梯度来进行momentum accumulation,我们更进一步考虑the gradient variance of the previous iteration 来tune the current gradient variance of the previous iteration to tune the current gradient so as to stabilize the update direction and escape from poor local optima。 Empirical results on
the standard ImageNet dataset demonstrate that our method
could significantly improve the transferability of gradientbased adversarial attacks. Besides, our method could be
used to attack ensemble models or be integrated with various input transformations. Incorporating variance tuning with input transformations on iterative gradient-based
attacks in the multi-model setting, the integrated method
could achieve an average success rate of 90.1% against
nine advanced defense methods, improving the current best
attack performance significantly by 85.1%

3 Methodology

3.1 Motivation

给定一个参数为 θ \theta θ的目标分类器 f f f以及一个benign图片 x ∈ X x\in\mathcal{X} xX,这里 x x x d d d维的并且 X \mathcal{X} X指代所有的合法图片,对抗攻击的目标是找到对抗样本 x a d v ∈ X x^{adv}\in\mathcal{X} xadvX满足如下的关系:

f ( x ; θ ) ≠ f ( x a d v ; θ ) s . t . ∥ x − x a d v ∥ < ϵ ( 3 ) f(x;\theta)\ne f(x^{adv};\theta)\quad\quad\quad\quad\quad\quad\quad\quad\\s.t.\quad\Vert x-x^{adv}\Vert<\epsilon\quad\quad\quad\quad\quad\quad(3) f(x;θ)=f(xadv;θ)s.t.xxadv<ϵ(3)

对于白盒攻击,我们可以将攻击视为an optimization problem that searches an example in the neighborhood of x so as to maximize the loss function J of the target classifier f:

x a d v = arg max ⁡ ∥ x ′ − x ∥ p < ϵ   J ( x ′ , y ; θ ) ( 4 ) x^{adv}=\argmax\limits_{\Vert x'-x\Vert_p<\epsilon}\ J(x',y;\theta)\quad\quad\quad\quad(4) xadv=xxp<ϵargmax J(x,y;θ)(4)

[18]将对抗样本的生成过程类推为标准的神经网络训练过程,输入 x x x可以被视为要训练的参数,目标模型可以被当作训练集。从这个角度来看,对抗样本的可转移性和正常训练的模型的泛化过程是一致的。因此,现有的工作主要关注于更好的优化算法(例如MI-FGSM,NI-FGSM)[14, 6, 18] 或者数据扩充(例如ensemble attack on multiple models or input transformations) [19, 35, 7, 18, 33]来提升攻击的可转移性。

在这篇文章中,我们将迭代的基于梯度的对抗攻击视作一个stochastic gradient decent(SGD)优化过程。在每一轮迭代中,攻击者总是选择目标模型进行更新。正如先前工作所阐述的[26, 28, 13],SGD会引入很大的variance due to the randomness, leading to slow convergence。为了解决这个问题,多种variance reduction methods have been proposed to accelerate the convergence of SGD, e.g. SAG (stochastic average gradient) [26], SDCA (stochastic dual coordinate ascent) [28], and SVRG (stochastic variance reduced gradient) [13], which adopt the information from the training set to reduce the variance。此外,Nesterov’s accelerated gradient [24] that boosts the convergence, is beneficial to improve the attack transferability [18]。

基于上述分析,我们尝试通过gradient variance tuning策略来提升对抗可转移性。我们的方法和有着variance reduction的SGD(SGDVRMs)的主要区别是三方面的。首先,我们尝试构建高转移性的adversaries,which is equivalent to improving the generalization of the training models, while SGDVRMs aim to accelerate the convergence。第二,we consider the gradient variance of the examples sampled in the neighborhood of input x, which is equivalent to the one in the parameter space for training the neural models but SGDVRMs utilize variance in the training set. Third, our variance tuning strategy is more generalized and can be used to improve the performance of MI-FGSM and NI-FGSM.

3.2 Variance Tuning Gradient-based Attacks

典型的基于梯度的迭代攻击(例如 I-FGSM)greedily search an adversarial example in the direction of the sign of the gradient at each iteration, as shown in Eq. (1), which may easily fall into poor local optima and “overfit” the model [6]。MI-FGSM [6] integrates momentum into I-FGSM for the purpose of stabilizing the update directions and escaping from poor local optima to improve the attack transferability. NI-FGSM [18] further adopts Netserov’s accelerated gradient [24] into I-FGSM to improve the transferability by leveraging its looking ahead property.

We observe that the above methods only consider the
data points along the optimization path, denoted as x
adv
0 =
x, xadv
1
, …, xadv
t−1
, xadv
t
, …, xadv
T = x
adv. In order to avoid
overfitting and further improve the transferability of the adversarial attacks, we adopt the gradient information in the
neighborhood of the previous data point to tune the gradient of the current data point at each iteration. Specifically,
for any input x ∈ X , we define the gradient variance as
follows.

定义1 Gradient Variance

给定一个参数为 θ \theta θ的判别器 f f f、损失函数 J ( x , y ; θ ) J(x,y;\theta) J(x,y;θ)、一个任意图片 x ∈ X x\in\mathcal{X} xX以及an upper bound ϵ ′ \epsilon' ϵ for the neighborhood,gradient variance可以通过如下的方式进行定义:

V ϵ ′ g ( x ) = E ∥ x ′ − x ∥ p < ϵ ′ [ ∇ x ′ J ( x ′ , y ; θ ) ] − ∇ x J ( x , y ; θ ) V_{\epsilon'}^g(x)=\mathbb{E}_{\Vert x'-x\Vert_p<\epsilon'}[\nabla_{x'}J(x',y;\theta)]-\nabla_xJ(x,y;\theta) Vϵg(x)=Exxp<ϵ[xJ(x,y;θ)]xJ(x,y;θ)

在后面的介绍中,我们使用 V ( x ) V(x) V(x)来表示 V ϵ ′ g ( x ) V_{\epsilon'}^g(x) Vϵg(x)并且设定 ϵ ′ = β ϵ \epsilon'=\beta\epsilon ϵ=βϵ,这里 β \beta β是一个超参数并且 ϵ \epsilon ϵ is the upper bound of the perturbation magnitude。但实际上,由于输入空间的连续性,我们无法直接计算 E ∥ x ′ − x ∥ p < ϵ ′ [ ∇ x ′ J ( x ′ , y ; θ ) ] \mathbb{E}_{\Vert x'-x\Vert_p<\epsilon'}[\nabla_{x'}J(x',y;\theta)] Exxp<ϵ[xJ(x,y;θ)]。因此,we approximate its value by sampling N N N examples in the neighborhood of x x x to calculate V ( x ) V(x) V(x)

V ( x ) = 1 N ∑ i = 1 N ∇ x i J ( x i , y ; θ ) − ∇ x J ( x , y ; θ ) ( 7 ) V(x)=\frac{1}{N}\sum\limits_{i=1}^N\nabla_{x^i}J(x^i,y;\theta)-\nabla_xJ(x,y;\theta)\quad\quad\quad\quad\quad(7) V(x)=N1i=1NxiJ(xi,y;θ)xJ(x,y;θ)(7)

这里 x i = x + r i , r i ∼ U [ − ( β ⋅ ϵ ) D , ( β ⋅ ϵ ) d ] x^i=x+r_i,r_i\sim U[-(\beta\cdot\epsilon)^D,(\beta\cdot\epsilon)^d] xi=x+ri,riU[(βϵ)D,(βϵ)d],并且 U [ a d , b d ] U[a^d,b^d] U[ad,bd]stands for the uniform distribution in d d d dimensions。

在获取了gradient variance后,我们可以tune the gradient of x t a d v x_t^{adv} xtadv at the t-th iteration with the gradient variance V ( x t − 1 a d v ) V(x_{t-1}^{adv}) V(xt1adv) at the ( t − 1 ) (t − 1) (t1)-th iteration to stabilize the
update direction. The algorithm of variance tuning MI-FGSM, denoted as VMI-FGSM, is summarized in Algorithm 1. Note that our method is generally applicable to any gradient-based attack method. We can easily extend VMI-FGSM to variance tuning NI-FGSM (VNI-FGSM), and integrate these methods with DIM, TIM and SIM as in [18].

你可能感兴趣的:(论文解读,深度学习)