攻击方法分类标准:
假正性攻击(false positive)与伪负性攻击(false negative)
白盒攻击(white box)与黑盒攻击(black box):
有目标攻击(target attack)和无目标攻击(non-target attack):
单步攻击(One-time attack)和迭代攻击(Iteration attack):
个体攻击(Individual attack)和普适性攻击(Universal attack):
优化扰动(optimized perturbation)和约束扰动(constrained perturbation):
数据集和被攻击模型:
该算法是一个典型的黑盒攻击算法,它采用对称伤差来估测海森矩阵和梯度,不需要获取目标模型的梯度信息
模型输入:需要input+每一个类别的概率
模型的训练:
损失函数如上,左边保证对抗样本与真实input的相似,右边保证对抗样本能导致目标模型出错,具体如下:
目标
DNN model,如果目标model的数据集类似mnist,图片较小,就不会使用到attack-space hierarchical attack importance sampling
随机选取一个坐标
估计梯度, h h h非常小, e i ei ei是一个只有i-th元素等于1的偏置向量。第二个只在牛顿法中才会使用。
获得了上面的近似梯度后,利用一阶或二阶方法(如下红框内的adam方法和newton method)来获取best梯度
https://zhuanlan.zhihu.com/p/57733228
参考文献
[1] Yuan, Xiaoyong, et al. "Adversarial examples: Attacks and defenses for deep learning."IEEE transactions on neural networks and learning systems(2019).
[2] Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
[3] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199.
[4] Moosavi Dezfooli S M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks//Proceedings of 2016 IEEE Conference on Comput
[5]Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations.arXiv preprint.er Vision and Pattern Recognition (CVPR). 2016 (EPFL-CONF-218057).
[6] Carlini, Nicholas, and David Wagner. “Towards evaluating the robustness of neural networks.” Security and Privacy (SP), 2017 IEEE Symposium on. IEEE, 2017.
[7] A. Rozsa, E. M. Rudd, and T. E. Boult, “Adversarial diversity and hard positive generation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) Workshops, Jun. 2016, pp. 25–32.