【论文笔记】【ICLR2018】Towards Deep Learning Models Resistant to Adversarial

个人总结:本文方法在19,20年的多数相关文章中都作为对比实验出现比如上述文章、AdvGan、AdvCam等,可见这篇文章的提出方法的效果还是很可以的。这篇文章提出了一个Min-Max的攻击&防御融为一体的框架,在该框架控制下使用PGD(迭代FGSM的一般方法)生成的对抗样本进行对抗训练,提高模型的鲁棒性,抵制一系列的first-order attack(基于梯度)。不过本文中提出解决方法采用的近似Danskin定理的部分没太看懂,还需进一步了解。


Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples—inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary . These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models.



L(θ,x+δ,y) 是无目标标签攻击者的目标函数,它的物理意义就是寻找合适的 δ 使得损失函数在( x + δ , y )这个样本点上的函数越大越好,这样才能让模型在它自己正确的标签上的损失特别大,从而导致正确标签对应的logit很小。我们可以使用PGD、FGSM、I-FGSM等方法去寻找对抗样本。
外层的 min ρ(θ)就是防御者的目标函数,它们的目的是为了让模型在遇到对抗样本的情况下,整个数据分布上的损失的期望还是最小,如果能做到这一点,那么再遇到对抗样本的时候也不用担心,因为这种对抗样本不能产生很大的损失值。通过对抗训练来完成这个min。
采用的PGD的方法就是projected gradient descent,也就是多步FGSM。
理论上只要解决这个min_max攻击防御为一体的问题,就可提高抵抗一系列的对抗样本的能力。但存在的问题是这个max和min方法都是non-convexity 或 non-concavity的。作者通过对抗训练并验证发现对抗训练后的网络的loss都非常集中并且非常小。
【论文笔记】【ICLR2018】Towards Deep Learning Models Resistant to Adversarial_第1张图片


