本文是深度学习恶意样本(Adversarial Example)实战系列文章的第三篇。在【1】中,我们详细解释了经典的AE生成算法FGSM,并手工编写实现了这个算法。在【2】中,我们介绍了IBM研究人员实现的一个开源工具箱ART,在Python编程时,调用该工具箱所提供的方法,可以很容易生成各种流行的图像恶意样本。本文将介绍另外一个比较常见的工具箱FoolBox(参见文献【3】)的使用方法。
与之前的文章类似,本文中的例子主要演示对CIFAR10中的彩色图像进行攻击的方法。代码采用Python编写,在实验下面的代码之前,请确保必要的软件包(例如FoolBox)都已经正确安装。作为开始,请先导入必要的库。
import os
import random
import shutil
import pickle
import time
import keras
import copy
import foolbox
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from foolbox.criteria import TargetClass
from keras.models import load_model
from keras.datasets import cifar10
Keras中已经自带了CIFAR10数据集,可以使用下面的代码直接导入。
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
x_train_mean = np.mean(x_train, axis=0)
x_train -= x_train_mean
x_test -= x_train_mean
min_ = np.min(x_train)
max_ = np.max(x_train)
print(min_)
print(max_)
# instantiate model
keras.backend.set_learning_phase(0)
在下面的代码中,我们使用load_model()函数直接将训练好的ResNet模型导入。然后测试一下它在测试集上的准确率(应该为91.96%)。
num_classes = 10
my_model = load_model('/home/fzuo/Desktop/cifar10_ResNet32v1_model.189.h5')
y_test_v = keras.utils.to_categorical(y_test, num_classes)
y_train_v = keras.utils.to_categorical(y_train, num_classes)
scores = my_model.evaluate(x_test, y_test_v, verbose=1)
print('Test accuracy:', scores[1])
##Test accuracy: 0.9196
注意到ResNet在CIFAR10的测试集上并不会得到100%的准确率。也就是说,在不施加任何攻击的情况下,仍然会有一些图像不能被模型正确的分类。为了排除这些“特例”图像的影响,我将其中不能正确分类的图像排除掉,然后把可以被正确分类的9196张图像重新存储成一个pickle文件,连同它们对应的正确的分类标签也一起存起来。此外,这个例子会使用著名的Carlini-Wagner算法来作为AE生成算法(或简称为CW算法)。该算法是一个Targeted的攻击算法,也就是说它可以根据攻击者想要的结果来生成恶意图像,使得神经网络被定向误导。因此,我们还需要提供一个目标标签作为输入。CIFAR10的分类标签是0~9,这里我们将目标标签设定为正确标签的‘下一个’。例如,如果正确的分类标签是1,那么我们就生成一个会让神经网络产生2这样分类结果的AE。数据处理的部分并非本文重点,这里我将直接读入已经保存好的数据集。
with open('successful_test_9196.pkl', 'rb') as f:
x_ben_imgs = pickle.load(f)
with open('successful_test_labels_9196.pkl', 'rb') as f:
x_ben_imgs_labels = pickle.load(f)
with open('successful_test_targeted_labels_9196.pkl', 'rb') as f:
x_ben_imgs_targets = pickle.load(f)
y_pred = my_model.predict(x_ben_imgs[0].reshape(1, 32, 32, 3))
print(np.argmax(y_pred))
print(x_ben_imgs_labels[0])
print(int(x_ben_imgs_targets[0]))
下面的代码演示里对单张图像进行攻击的方法,如果攻击成功则将新生成的恶意图像绘制出来:
preprocessing = (np.array([0, 0, 0]), 1)
fmodel = foolbox.models.KerasModel(my_model, bounds=(min_, max_), preprocessing=preprocessing)
attack = foolbox.attacks.CarliniWagnerL2Attack(fmodel, criterion=TargetClass(int(x_ben_imgs_targets[0])))
if(np.argmax(y_pred) == x_ben_imgs_labels[0]):
cw_ae = attack(x_ben_imgs[0], label = x_ben_imgs_labels[0], confidence = 0, max_iterations=100)
try:
assert cw_ae is not None
except Exception as e:
print('Attack Fails Becuase None Is Returned!')
else:
adv_test = cw_ae.reshape(1, 32, 32, 3)
y_pred1 = my_model.predict(adv_test)
if(np.argmax(y_pred1) != (x_ben_imgs_labels[0]+1)%10):
print('Attack Fails Becuase Classification Is Correct!')
else:
plt.imshow(adv_test.reshape(32,32,3) + x_train_mean)
下图即为新生成的恶意图像,注意CIFAR10中图像的大小是32×32,所以下图展示的其实是放大了之后的效果:
如果你需要批量生成一组恶意图像,那么可以参考下面的代码,具体方法是对图像逐个攻击,然后把结果“粘合”在一起:
sum_num_attack = 1
succe_labels = [x_ben_imgs_labels[0]]
for index in range(1, 10):
image = x_ben_imgs[index]
correct_label = x_ben_imgs_labels[index]
target_label = int(x_ben_imgs_targets[index])
image_f = copy.deepcopy(image)
x_image = np.expand_dims(image_f, axis=0)
yhat_pred = np.argmax(my_model.predict(x_image))
if(yhat_pred != correct_label):
print("Error!!! Originial Image Cannot Be Predicted Correctly!!!")
else:
attack = foolbox.attacks.CarliniWagnerL2Attack(fmodel, criterion=TargetClass(target_label))
cw_ae = attack(image, label = correct_label, confidence = 0, max_iterations=100)
try:
assert cw_ae is not None
except Exception as e:
print('Attack Fails Because None Is Returned!')
else:
adv = cw_ae.reshape(1, 32, 32, 3)
y_pred1 = my_model.predict(adv)
if(np.argmax(y_pred1) != target_label):
print('Attack Fails Becuase Cannot Output Target As Desired!')
else:
sum_num_attack += 1
adv_test = np.concatenate([adv_test, cw_ae.reshape(1,32,32,3)])
succe_labels.append(x_ben_imgs_labels[index])
print(sum_num_attack)
下面来测试一下生成的10张恶意图像对于ResNet分类器的影响:
for index in range(10):
print(x_ben_imgs_labels[index], end = ' ')
print("")
for index in range(10):
y_pred = my_model.predict(adv_test[index].reshape(1,32,32,3))
print(np.argmax(y_pred), end = ' ')
输出的结果如下:
3 8 8 0 6 6 1 6 3 1
4 9 9 1 7 7 2 7 4 2
可见,我们生成的AE成功地将神经网络误导了。
*包含本文所使用之代码的Jupyter Notebook文件可以从链接【4】中下载得到。
参考文献
【1】深度学习中的Adversarial Examples(基于Python实现)
【2】深度学习的恶意样本实践(Adversarial Example)
【3】FoolBox的帮助文档
【4】云盘链接(提取码: vtcb)