Adversarial examples

相关文章链接：

深度学习对抗样本的八个误解与事实

[2014 ICLR]Intriguing properties of neural networks

背景：深度学习（特别是DNN）下的图像分类问题

Formal description of search/generate adversarial examples：

求解方法：box-constrained L-BFGS

需要注意的是：

重要结论：

Possible explanation is that the set of adversarial negatives is of extremely low probability, and thus is never (or rarely) observed in the test set, yet it is dense (much like the rational numbers), and so it is found near every virtually every test case.

讲解链接1
讲解链接2

[2017ICLR]Adversarial examples in the physical world

ABSTRACT：

Most existing machine learning classifiers are highly vulnerable to adversarial examples.
An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it.
In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier still makes a mistake.
Adversarial examples pose security concerns because they could be used to perform an attack on machine learning systems, even if the adversary has no access to the underlying model.
Up to now, all previous work has assumed a threat model in which the adversary can feed data directly into the machine learning classifier. This is not always the case for systems operating in the physical world, for example those which are using signals from cameras and other sensors as input. This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from a cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera.

之前工作（transferability property）:
We showed that an adversarial example that was designed to be misclassified by a model M1 is often also misclassified by a model M2. This adversarial example transferability property means that it is possible to generate adversarial examples and perform a misclassification attack on a machine learning system without access to the underlying model