在安防和司法领域,图像是一种重要的线索和证物,但在PS盛行的当下,并不是随意一张图像都可以具备此功能,一般而言要求图像没有被篡改过。毕竟我们谁都不希望自己的脸在非正常的情况下,无缘无故地出现在了犯罪现场,甚至出现在犯罪嫌疑人身上;或者拍摄的合同图像中的关键文字发生了不利的变化等等。
另外在美颜盛行的当下,或许一些人有“反美颜”的需求?毕竟有一部分人不太希望被“照骗”。
实践中,图像篡改至少有以下几种类型:
一般而言我们的最终目的都是识别第一类篡改,但是难度很大,需要很深厚的司法、摄影和图像专业知识,比如在传统的图像篡改识别领域,有噪声一致性、几何一致性、光照一致性等等方式来进行判断。实际操作时,需要遍历可疑区域,且每一个可疑区域都需要遍历各种方法进行检验,所以非常地费时费力。
如果使用深度学习方法的话,可以利用图像分割的方式直接将篡改区域分割出来,然而实际训练一下就会发现,难度是真的大,因为训练数据非常难做。至少有两种方式制作这类训练数据,但各有优缺点:
由于以上原因,我们常常先判断一下图像是否存在过第二类篡改,从技术上讲第二类篡改相对容易识别一些,如果存在第二类篡改的话那么可能就需要仔细点对待了。
下面的部分主要针对第二类篡改进行叙述。第一类比较难搞,不是一个人在家里拿着1050ti随便搞搞就能搞定的,所以本文档就不在这方面搞事情了…然而如果实在对第一类篡改有兴趣的话,可以参考一下adobe 2019的创意者大会。
如果使用传统方法识别上面所述的第二类篡改,事实上还是有点难度的,特别是中值滤波这种高度非线性的操作,但是用了深度学习后,果真大力出奇迹,随随便便就搞定了,下面是一些相关实验。
在继续向下看之前应当明确,图像篡改识别是一件“与人斗”的事情,很难给出一个“做好”的定义。
这一点不同于一些通用的CV,比如车牌识别人脸识别等,我做好了,达到一定的标准就可以铺开了商用,尽管车牌也存在套牌,人脸存在活体、面具等问题,但是问题种类比较少,并且也都存在一些明确的方案或技术来解决这些问题。
所以,本文档只是浅尝辄止地对上面所述的第二类篡改做了一些简单的实验。
代码包含三个文件:
一、util.py
里面是一些辅助函数,包括了部分篡改类型,随机获取用于训练的图像块(image patches)等等操作,具体原理可以参考下面两个文档:
二、generate_train_test_data.py
用于制作训练数据,因为这里只是做个简单的实验,使用的数据并不多,所以可以一次性加载入内存中,因此数据保存为numpy的.npy
格式。数据包括训练集的60张图片和测试集的30张图片,均使用手机随意拍摄得到(没有开美颜),用于拍摄的手机型号有三种:荣耀10,荣耀30,mate 30。用于训练的图像块大小是28*28,训练集截取了约30万个图像块,测试集截取了约15万个图像块。该代码文件中用于生成tampered_image
部分代码可以进行修改,以测试各种篡改方式。另外如果不做额外说明,下面实验中日志的结果反映的就是代码中的数据篡改参数。
三、train.py
用于训练,包括Dataset的生成,模型定义,训练和测试流程等等。因为是比较简单的实验,所以就全部写在一起了。超参数如下:
下面是各个代码文件的内容。如果有兴趣跑一下下面的代码的话,需要注意两点:
util.py
# -*- coding: utf-8 -*-
import os
import cv2
import numpy as np
def uniform_random(low, high, shape=None):
"""
Get uniform random number(s) between low and high
Parameters
----------
low: low limit of random number(s)
high: high limit of random number(s)
shape: shape of output array. A single number is returned if shape is None
Returns
-------
Uniform random number(s) between low and high
"""
return np.random.random(shape) * (high - low) + low
def add_gaussian_noise(image, mean_ratio, std_ratio, noise_num_ratio=1.0):
"""
Add Gaussian nosie to image.
Parameters
----------
image: image data read by opencv, shape is [H, W, C]
mean_ratio: ratio with respect to image_mean for mean of gaussian random
numbers
std_ratio: ratio with respect to image_mean for std (scale) of gaussian
random numbers
noise_num_ratio: ratio of noise number with respect to the total number of
pixels, between [0, 1]
Returns
-------
noisy_image: image after adding noise
"""
if std_ratio < 0:
raise ValueError('std_ratio must >= 0.0')
if not 0.0 <= noise_num_ratio <= 1.0:
raise ValueError('noise_num_ratio must between [0, 1]')
# get noise shape and channel number
noise_shape = get_noise_shape(image)
channel = noise_shape[2]
# compute channel-wise mean and std
image_mean = np.array(cv2.mean(image)[:channel])
mean = image_mean * mean_ratio
std = image_mean * std_ratio
# generate noise
noise = np.random.normal(mean, std, noise_shape)
noisy_image = image.copy().astype(np.float32)
if noisy_image.ndim == 2:
noisy_image = noisy_image[..., np.newaxis] # add channel axis
# add noise according to noise_num_ratio
if noise_num_ratio >= 1.0:
noisy_image[:, :, :channel] += noise
else:
row, col = get_noise_index(image, noise_num_ratio)
noisy_image[row, col, :channel] += noise[row, col, ...]
# post processing
noisy_image = float_to_uint8(noisy_image, scale=1.0)
noisy_image = np.squeeze(noisy_image)
return noisy_image
def float_to_uint8(image, scale=255.0):
"""
Convert image from float type to uint8, meanwhile the clip between [0, 255]
will be done.
Parameters
----------
image: numpy array image data of float type
scale: a scale factor for image data
Returns
-------
image_uint8: numpy array image data of uint8 type
"""
image_uint8 = np.clip(np.round(image * scale), 0, 255).astype(np.uint8)
return image_uint8
def get_noise_index(image, noise_num_ratio):
"""
Get noise index for a certain ratio of noise number
Parameters
----------
image: numpy array image data
noise_num_ratio: ratio of noise number with respect to the total number of
pixels, between [0, 1]
Returns
-------
row: row indexes
col: column indexes
"""
image_height, image_width = image.shape[0:2]
noise_num = int(np.round(image_height * image_width * noise_num_ratio))
row = np.random.randint(0, image_height, noise_num)
col = np.random.randint(0, image_width, noise_num)
return row, col
def get_noise_shape(image):
"""
Get noise shape according to image shape.
Parameters
----------
image: numpy array image data
Returns
-------
noise_shape: a tuple whose length is 3
The shape of noise. Let height, width be the image height and width.
If image.ndim is 2, output noise_shape will be (height, width, 1),
else (height, width, 3)
"""
if not (image.ndim == 2 or image.ndim == 3):
raise ValueError('image ndim must be 2 or 3')
height, width = image.shape[:2]
if image.ndim == 2:
channel = 1
else:
channel = image.shape[2]
if channel >= 4:
channel = 3
noise_shape = (height, width, channel)
return noise_shape
def jpeg_compression(image, quality_factor):
"""
Apply jpeg compression to image without saving it to disk.
Parameters
----------
image: image data read by opencv, shape is [H, W, C]
quality_factor: jpeg quality factor, between [0, 1]. Higher value means
higher quality image
Returns
-------
jpeg_image: jpeg compressed image
"""
compression_factor = int(quality_factor)
compression_param = [cv2.IMWRITE_JPEG_QUALITY, compression_factor]
image_encode = cv2.imencode('.jpg', image, compression_param)[1]
jpeg_image = cv2.imdecode(image_encode, -1)
return jpeg_image
def get_random_patch_bboxes(image, bbox_size, stride, jitter, roi_bbox=None):
"""
Generate random patch bounding boxes for a image around ROI region
Parameters
----------
image: image data read by opencv, shape is [H, W, C]
bbox_size: size of patch bbox, one digit or a list/tuple containing two
digits, defined by (width, height)
stride: stride between adjacent bboxes (before jitter), one digit or a
list/tuple containing two digits, defined by (x, y)
jitter: jitter size for evenly distributed bboxes, one digit or a
list/tuple containing two digits, defined by (x, y)
roi_bbox: roi region, defined by [xmin, ymin, xmax, ymax], default is whole
image region
Returns
-------
patch_bboxes: randomly distributed patch bounding boxes, n x 4 numpy array.
Each bounding box is defined by [xmin, ymin, xmax, ymax]
"""
height, width = image.shape[:2]
bbox_size = _process_geometry_param(bbox_size, min_value=1)
stride = _process_geometry_param(stride, min_value=1)
jitter = _process_geometry_param(jitter, min_value=0)
if bbox_size[0] > width or bbox_size[1] > height:
raise ValueError('box_size must be <= image size')
if roi_bbox is None:
roi_bbox = [0, 0, width, height]
# tl is for top-left, br is for bottom-right
tl_x, tl_y = _get_top_left_points(roi_bbox, bbox_size, stride, jitter)
br_x = tl_x + bbox_size[0]
br_y = tl_y + bbox_size[1]
# shrink bottom-right points to avoid exceeding image border
br_x[br_x > width] = width
br_y[br_y > height] = height
# shrink top-left points to avoid exceeding image border
tl_x = br_x - bbox_size[0]
tl_y = br_y - bbox_size[1]
tl_x[tl_x < 0] = 0
tl_y[tl_y < 0] = 0
# compute bottom-right points again
br_x = tl_x + bbox_size[0]
br_y = tl_y + bbox_size[1]
patch_bboxes = np.concatenate((tl_x, tl_y, br_x, br_y), axis=1)
return patch_bboxes
def _process_geometry_param(param, min_value):
"""
Process and check param, which must be one digit or a list/tuple containing
two digits, and its value must be >= min_value
Parameters
----------
param: parameter to be processed
min_value: min value for param
Returns
-------
param: param after processing
"""
if isinstance(param, (int, float)) or \
isinstance(param, np.ndarray) and param.size == 1:
param = int(np.round(param))
param = [param, param]
else:
if len(param) != 2:
raise ValueError('param must be one digit or two digits')
param = [int(np.round(param[0])), int(np.round(param[1]))]
# check data range using min_value
if not (param[0] >= min_value and param[1] >= min_value):
raise ValueError('param must be >= min_value (%d)' % min_value)
return param
def _get_top_left_points(roi_bbox, bbox_size, stride, jitter):
"""
Generate top-left points for bounding boxes
Parameters
----------
roi_bbox: roi region, defined by [xmin, ymin, xmax, ymax]
bbox_size: size of patch bbox, a list/tuple containing two digits, defined
by (width, height)
stride: stride between adjacent bboxes (before jitter), a list/tuple
containing two digits, defined by (x, y)
jitter: jitter size for evenly distributed bboxes, a list/tuple containing
two digits, defined by (x, y)
Returns
-------
tl_x: x coordinates of top-left points, n x 1 numpy array
tl_y: y coordinates of top-left points, n x 1 numpy array
"""
xmin, ymin, xmax, ymax = roi_bbox
roi_width = xmax - xmin
roi_height = ymax - ymin
# get the offset between the first top-left point of patch box and the
# top-left point of roi_bbox
offset_x = np.arange(0, roi_width, stride[0])[-1] + bbox_size[0]
offset_y = np.arange(0, roi_height, stride[1])[-1] + bbox_size[1]
offset_x = (offset_x - roi_width) // 2
offset_y = (offset_y - roi_height) // 2
# get the coordinates of all top-left points
tl_x = np.arange(xmin, xmax, stride[0]) - offset_x
tl_y = np.arange(ymin, ymax, stride[1]) - offset_y
tl_x, tl_y = np.meshgrid(tl_x, tl_y)
tl_x = np.reshape(tl_x, [-1, 1])
tl_y = np.reshape(tl_y, [-1, 1])
# jitter the coordinates of all top-left points
tl_x += np.random.randint(-jitter[0], jitter[0] + 1, size=tl_x.shape)
tl_y += np.random.randint(-jitter[1], jitter[1] + 1, size=tl_y.shape)
return tl_x, tl_y
generate_train_test_data.py
# -*- coding: utf-8 -*-
import os
import cv2
import numpy as np
from util import uniform_random
from util import get_random_patch_bboxes
from util import jpeg_compression
from util import add_gaussian_noise
ROOT_FOLDER_TRAIN = r'F:\Forensic\train'
ROOT_FOLDER_TEST = r'F:\Forensic\test'
OUTPUT_FOLDER = r'F:\Forensic\noise'
PATCH_SHAPE = (28, 28)
STRIDE = (64, 64)
JITTER = (32, 32)
def make_data(root_folder, phase='train'):
"""
Make image patches and the corresponding labels, and then save them to
disk. Half of the patches are original, the other half are tampered.
Parameters
----------
root_folder: root_folder of original full image
phase: 'train' or 'test'
"""
files = os.listdir(root_folder)
# make data
real_patches = []
tampered_patches = []
for i, file in enumerate(files):
print(i + 1, file)
image = cv2.imread(os.path.join(root_folder, file))
# the following part can be modified to generate other types
# of tampered_image
''' Gaussian blur '''
ksize = np.random.choice([3, 5, 7, 9], size=2)
ksize = tuple(ksize)
tampered_image = cv2.GaussianBlur(
image, ksize,
sigmaX=uniform_random(1.0, 3.0),
sigmaY=uniform_random(1.0, 3.0))
''' Gaussian noise '''
# tampered_image = add_gaussian_noise(
# image,
# mean_ratio=0.0,
# std_ratio=uniform_random(0.01, 0.3))
''' median blur '''
# ksize = np.random.choice([3, 5, 7, 9])
# tampered_image = cv2.medianBlur(image, ksize=ksize)
''' JPEG compression '''
# tampered_image = jpeg_compression(image, uniform_random(50, 95))
''' brigntness '''
# brightness = uniform_random(-25, 25)
# tampered_image = np.float64(image) + brightness
# tampered_image = np.clip(np.round(tampered_image), 0, 255)
# tampered_image = np.uint8(tampered_image)
''' contrast '''
# contrast = uniform_random(0.75, 1.33)
# tampered_image = np.float64(image) * contrast
# tampered_image = np.clip(np.round(tampered_image), 0, 255)
# tampered_image = np.uint8(tampered_image)
patch_bboxes = get_random_patch_bboxes(
image, PATCH_SHAPE, STRIDE, JITTER)
blur_patch_bboxes = get_random_patch_bboxes(
image, PATCH_SHAPE, STRIDE, JITTER)
for bbox in patch_bboxes:
xmin, ymin, xmax, ymax = bbox
real_patches.append(image[ymin:ymax, xmin:xmax])
for bbox in blur_patch_bboxes:
xmin, ymin, xmax, ymax = bbox
tampered_patches.append(tampered_image[ymin:ymax, xmin:xmax])
real_patches = np.array(real_patches)
tampered_patches = np.array(tampered_patches)
real_labels = np.ones(shape=real_patches.shape[0], dtype=np.int64)
tampered_labels = np.zeros(shape=tampered_patches.shape[0], dtype=np.int64)
patches = np.concatenate((real_patches, tampered_patches), axis=0)
patches = patches.transpose([0, 3, 1, 2])
labels = np.concatenate((real_labels, tampered_labels))
# save data
os.makedirs(OUTPUT_FOLDER, exist_ok=True)
np.save(os.path.join(OUTPUT_FOLDER, '%s_data.npy' % phase), patches)
np.save(os.path.join(OUTPUT_FOLDER, '%s_label.npy' % phase), labels)
print('Total number of train samples is %d' % labels.shape[0])
if __name__ == '__main__':
make_data(ROOT_FOLDER_TRAIN, 'train')
make_data(ROOT_FOLDER_TEST, 'test')
train.py
# -*- coding: utf-8 -*-
import os
import time
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import torchsummary
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
EPOCH = 10
TRAIN_BATCH_SIZE = 50
TEST_BATCH_SIZE = 32
BASE_CHANNEL = 32
INPUT_CHANNEL = 3
INPUT_SIZE = 28
TRAIN_DATA_FILE = r'F:\Forensic\noise\train_data.npy'
TRAIN_LABEL_FILE = r'F:\Forensic\noise\train_label.npy'
TEST_DATA_FILE = r'F:\Forensic\noise\test_data.npy'
TEST_LABEL_FILE = r'F:\Forensic\noise\test_label.npy'
MODEL_FOLDER = r'.\saved_model'
def update_learing_rate(optimizer, epoch):
"""
Update learning rate stepwise for optimizer
Parameters
----------
optimizer: pytorch optimizer
epoch: epoch
"""
learning_rate = 1e-4
if epoch > 5:
learning_rate = 1e-5
for param_group in optimizer.param_groups:
param_group['lr'] = learning_rate
class Model(nn.Module):
"""
6 layers plain model for forensic classification
"""
def __init__(self, input_ch, num_classes, base_ch):
super(Model, self).__init__()
self.num_classes = num_classes
self.base_ch = base_ch
self.feature_length = base_ch * 4
self.net = nn.Sequential(
nn.Conv2d(input_ch, base_ch, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(base_ch, base_ch, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(base_ch, base_ch * 2, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(base_ch * 2, base_ch * 2, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(base_ch * 2, self.feature_length, kernel_size=3,
padding=1),
nn.ReLU(),
nn.Conv2d(self.feature_length, self.feature_length, kernel_size=3,
padding=1),
nn.ReLU(),
nn.AdaptiveAvgPool2d(output_size=(1, 1))
)
self.fc = nn.Linear(in_features=self.feature_length,
out_features=num_classes)
def forward(self, input):
output = self.net(input)
output = output.view(-1, self.feature_length)
output = self.fc(output)
return output
class ForensicDataset(Dataset):
"""
Pytorch dataset for train and test
"""
def __init__(self, data, label):
super(Dataset).__init__()
self.data = data
self.label = label
self.num = len(label)
def __len__(self):
return self.num
def __getitem__(self, index):
data = self.data[index]
label = self.label[index]
return data, label
def load_dataset():
"""
Load train and test dataset
"""
# load train dataset
data = np.load(TRAIN_DATA_FILE).astype(np.float32)
label = np.load(TRAIN_LABEL_FILE).astype(np.int64)
data = torch.from_numpy(data)
label = torch.from_numpy(label)
train_dataset = ForensicDataset(data, label)
# load test dataset
data = np.load(TEST_DATA_FILE).astype(np.float32)
label = np.load(TEST_LABEL_FILE).astype(np.int64)
data = torch.from_numpy(data)
label = torch.from_numpy(label)
test_dataset = ForensicDataset(data, label)
return train_dataset, test_dataset
if __name__ == '__main__':
time_beg = time.time()
train_dataset, test_dataset = load_dataset()
train_loader = DataLoader(dataset=train_dataset,
batch_size=TRAIN_BATCH_SIZE,
shuffle=True)
test_loader = DataLoader(dataset=test_dataset,
batch_size=TEST_BATCH_SIZE,
shuffle=False)
model = Model(input_ch=INPUT_CHANNEL, num_classes=2,
base_ch=BASE_CHANNEL).cuda()
torchsummary.summary(
model, input_size=(INPUT_CHANNEL, INPUT_SIZE, INPUT_SIZE))
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
train_loss = []
for ep in range(1, EPOCH + 1):
update_learing_rate(optimizer, ep)
# ----------------- train -----------------
model.train()
time_beg_epoch = time.time()
loss_recorder = []
for data, classes in train_loader:
data, classes = data.cuda(), classes.cuda()
optimizer.zero_grad()
output = model(data)
loss = criterion(output, classes)
loss.backward()
optimizer.step()
loss_recorder.append(loss.item())
time_cost = time.time() - time_beg_epoch
print('\rEpoch: %d, Loss: %0.4f, Time cost (s): %0.2f' % (
ep, loss_recorder[-1], time_cost), end='')
# print train info after one epoch
train_loss.append(loss_recorder)
mean_loss_epoch = torch.mean(torch.Tensor(loss_recorder))
time_cost_epoch = time.time() - time_beg_epoch
print('\rEpoch: %d, Mean loss: %0.4f, Epoch time cost (s): %0.2f' % (
ep, mean_loss_epoch.item(), time_cost_epoch), end='')
# save model
os.makedirs(MODEL_FOLDER, exist_ok=True)
model_filename = os.path.join(MODEL_FOLDER, 'epoch_%d.pth' % ep)
torch.save(model.state_dict(), model_filename)
# ----------------- test -----------------
model.eval()
correct = 0
total = 0
for data, classes in test_loader:
data, classes = data.cuda(), classes.cuda()
output = model(data)
_, predicted = torch.max(output.data, 1)
total += classes.size(0)
correct += (predicted == classes).sum().item()
print(', Test accuracy: %0.4f' % (correct / total))
print('Total time cost: ', time.time() - time_beg)
可以看到,如果图像做了高斯模糊很容易被识别出来,很随意就能达到0.99+的准确率。
日志如下:
Epoch: 1, Mean loss: 0.3753, Epoch time cost (s): 59.67, Test accuracy: 0.9501
Epoch: 2, Mean loss: 0.0936, Epoch time cost (s): 58.75, Test accuracy: 0.9768
Epoch: 3, Mean loss: 0.0380, Epoch time cost (s): 58.66, Test accuracy: 0.9874
Epoch: 4, Mean loss: 0.0254, Epoch time cost (s): 58.72, Test accuracy: 0.9902
Epoch: 5, Mean loss: 0.0217, Epoch time cost (s): 58.69, Test accuracy: 0.9735
Epoch: 6, Mean loss: 0.0116, Epoch time cost (s): 58.67, Test accuracy: 0.9929
Epoch: 7, Mean loss: 0.0091, Epoch time cost (s): 60.25, Test accuracy: 0.9935
Epoch: 8, Mean loss: 0.0082, Epoch time cost (s): 62.64, Test accuracy: 0.9934
Epoch: 9, Mean loss: 0.0076, Epoch time cost (s): 62.41, Test accuracy: 0.9933
Epoch: 10, Mean loss: 0.0071, Epoch time cost (s): 59.13, Test accuracy: 0.9940
高斯噪音非常容易被识别出来,准确率极其随意就上了0.99。
日志如下:
Epoch: 1, Mean loss: 0.1213, Epoch time cost (s): 58.44, Test accuracy: 0.9740
Epoch: 2, Mean loss: 0.0447, Epoch time cost (s): 58.80, Test accuracy: 0.9562
Epoch: 3, Mean loss: 0.0272, Epoch time cost (s): 58.91, Test accuracy: 0.9867
Epoch: 4, Mean loss: 0.0170, Epoch time cost (s): 59.00, Test accuracy: 0.9885
Epoch: 5, Mean loss: 0.0071, Epoch time cost (s): 58.94, Test accuracy: 0.9760
Epoch: 6, Mean loss: 0.0014, Epoch time cost (s): 58.97, Test accuracy: 0.9942
Epoch: 7, Mean loss: 0.0006, Epoch time cost (s): 59.03, Test accuracy: 0.9928
Epoch: 8, Mean loss: 0.0005, Epoch time cost (s): 58.99, Test accuracy: 0.9933
Epoch: 9, Mean loss: 0.0004, Epoch time cost (s): 59.05, Test accuracy: 0.9952
Epoch: 10, Mean loss: 0.0004, Epoch time cost (s): 58.71, Test accuracy: 0.9968
中值滤波也比较容易就能识别出来,最高准确率虽然没有到0.99不过也接近了,增加点数据,多训几把碰碰运气,也不是很难达到。
中值滤波是一种非常强的非线性操作,使用传统方式其实挺难识别出来的,但是使用神经网络,很随意就搞定了。
日志如下:
Epoch: 1, Mean loss: 0.4308, Epoch time cost (s): 59.61, Test accuracy: 0.8943
Epoch: 2, Mean loss: 0.1859, Epoch time cost (s): 58.92, Test accuracy: 0.9280
Epoch: 3, Mean loss: 0.1213, Epoch time cost (s): 59.03, Test accuracy: 0.9467
Epoch: 4, Mean loss: 0.0848, Epoch time cost (s): 59.04, Test accuracy: 0.9460
Epoch: 5, Mean loss: 0.0587, Epoch time cost (s): 59.03, Test accuracy: 0.9645
Epoch: 6, Mean loss: 0.0269, Epoch time cost (s): 59.00, Test accuracy: 0.9813
Epoch: 7, Mean loss: 0.0209, Epoch time cost (s): 59.27, Test accuracy: 0.9822
Epoch: 8, Mean loss: 0.0185, Epoch time cost (s): 59.06, Test accuracy: 0.9857
Epoch: 9, Mean loss: 0.0170, Epoch time cost (s): 59.00, Test accuracy: 0.9854
Epoch: 10, Mean loss: 0.0156, Epoch time cost (s): 59.02, Test accuracy: 0.9763
JPEG压缩相对而言稍微难识别一点,在训练过程中,学习率策略与其它有所不同,我使用了1e-4做了2个epoch的预热,然后3-7 epoch使用了1e-3, 8-9 epoch使用了1e-4,最后一个epoch使用了1e-5。训了好几次发现,如果只使用1e-4和1e-5的话准确率只能到0.90+。(哎,也没啥特别的道理,就是一顿乱试,不过这里还是有一点规律可循的,一般我们希望初期可以在不发散的情况下尽量尝试大一点的学习率,以期望网络能够覆盖更广阔的的搜索空间)
尽管二次JPEG压缩略难识别,但准确率也达到了0.95+,还算可以了。
日志如下:
Epoch: 1, Mean loss: 0.6933, Epoch time cost (s): 58.97, Test accuracy: 0.5056
Epoch: 2, Mean loss: 0.5764, Epoch time cost (s): 58.88, Test accuracy: 0.7660
Epoch: 3, Mean loss: 0.3430, Epoch time cost (s): 58.83, Test accuracy: 0.7949
Epoch: 4, Mean loss: 0.1980, Epoch time cost (s): 58.88, Test accuracy: 0.8683
Epoch: 5, Mean loss: 0.1609, Epoch time cost (s): 58.88, Test accuracy: 0.9193
Epoch: 6, Mean loss: 0.1489, Epoch time cost (s): 58.85, Test accuracy: 0.9333
Epoch: 7, Mean loss: 0.1268, Epoch time cost (s): 58.81, Test accuracy: 0.9380
Epoch: 8, Mean loss: 0.0825, Epoch time cost (s): 58.95, Test accuracy: 0.9528
Epoch: 9, Mean loss: 0.0744, Epoch time cost (s): 59.06, Test accuracy: 0.9536
Epoch: 10, Mean loss: 0.0626, Epoch time cost (s): 58.83, Test accuracy: 0.9545
亮度和对比度可以放在一起讲。亮度和对比度有很多种修改方式,可以直接在RGB空间做,但更经常的做法是转换到YUV或者Lab等空间进行操作。此处我们简简单单地选择了在RGB空间进行操作,公式如下:
t a m p e r e d _ i m a g e = α ∗ i m a g e + β tampered\_image = \alpha *image + \beta tampered_image=α∗image+β
其中 α \alpha α用于修改对比度, β \beta β用于修改亮度。
此处对 β \beta β在两组取值范围下作了实验,发现识别准确率均非常低。此处我们需明确一点,这是一个二分类问题,50%的准确率意味着“瞎猜”,也就是完全无法识别。下面日志的准确率只是略高于50%,此处没有可视化分析,但是根据两次训练结果猜测高于50%的部分很有可能是因为图像进入了uint8类型的饱和区域,也就是说当 β \beta β很小或很大时,大量的值因为截断而变成了0或者255,所以被识别了出来。这种情况下肉眼也很容易能识别出篡改,所以我们基本可以认为神经网络在应对亮度篡改方面无能为力。
β \beta β取值 [ − 50 , 50 ] [-50, 50] [−50,50]时的日志如下:
Epoch: 1, Mean loss: 0.6558, Epoch time cost (s): 59.06, Test accuracy: 0.5207
Epoch: 2, Mean loss: 0.6231, Epoch time cost (s): 58.89, Test accuracy: 0.5444
Epoch: 3, Mean loss: 0.6063, Epoch time cost (s): 58.95, Test accuracy: 0.5833
Epoch: 4, Mean loss: 0.5933, Epoch time cost (s): 58.97, Test accuracy: 0.5988
Epoch: 5, Mean loss: 0.5839, Epoch time cost (s): 58.95, Test accuracy: 0.5981
Epoch: 6, Mean loss: 0.5628, Epoch time cost (s): 58.88, Test accuracy: 0.6009
Epoch: 7, Mean loss: 0.5582, Epoch time cost (s): 58.95, Test accuracy: 0.6037
Epoch: 8, Mean loss: 0.5556, Epoch time cost (s): 58.92, Test accuracy: 0.6018
Epoch: 9, Mean loss: 0.5535, Epoch time cost (s): 59.17, Test accuracy: 0.6007
Epoch: 10, Mean loss: 0.5515, Epoch time cost (s): 60.49, Test accuracy: 0.6016
β \beta β取值 [ − 25 , 25 ] [-25, 25] [−25,25]时的日志如下:
Epoch: 1, Mean loss: 0.6765, Epoch time cost (s): 59.06, Test accuracy: 0.5201
Epoch: 2, Mean loss: 0.6618, Epoch time cost (s): 58.56, Test accuracy: 0.5219
Epoch: 3, Mean loss: 0.6505, Epoch time cost (s): 58.81, Test accuracy: 0.5259
Epoch: 4, Mean loss: 0.6425, Epoch time cost (s): 58.94, Test accuracy: 0.5289
Epoch: 5, Mean loss: 0.6350, Epoch time cost (s): 58.85, Test accuracy: 0.5378
Epoch: 6, Mean loss: 0.6199, Epoch time cost (s): 58.75, Test accuracy: 0.5464
Epoch: 7, Mean loss: 0.6157, Epoch time cost (s): 58.70, Test accuracy: 0.5483
Epoch: 8, Mean loss: 0.6135, Epoch time cost (s): 58.75, Test accuracy: 0.5475
Epoch: 9, Mean loss: 0.6117, Epoch time cost (s): 58.88, Test accuracy: 0.5478
Epoch: 10, Mean loss: 0.6100, Epoch time cost (s): 58.41, Test accuracy: 0.5498
结论同亮度,神经网络对此项篡改的识别无能为力。
日志如下:
Epoch: 1, Mean loss: 0.6914, Epoch time cost (s): 59.21, Test accuracy: 0.4888
Epoch: 2, Mean loss: 0.6782, Epoch time cost (s): 58.85, Test accuracy: 0.5637
Epoch: 3, Mean loss: 0.6682, Epoch time cost (s): 58.85, Test accuracy: 0.5439
Epoch: 4, Mean loss: 0.6622, Epoch time cost (s): 58.89, Test accuracy: 0.5502
Epoch: 5, Mean loss: 0.6562, Epoch time cost (s): 58.78, Test accuracy: 0.5383
Epoch: 6, Mean loss: 0.6400, Epoch time cost (s): 58.87, Test accuracy: 0.5725
Epoch: 7, Mean loss: 0.6361, Epoch time cost (s): 58.92, Test accuracy: 0.5743
Epoch: 8, Mean loss: 0.6335, Epoch time cost (s): 58.80, Test accuracy: 0.5721
Epoch: 9, Mean loss: 0.6312, Epoch time cost (s): 58.78, Test accuracy: 0.5781
Epoch: 10, Mean loss: 0.6293, Epoch time cost (s): 58.81, Test accuracy: 0.5798
针对以上的实验,有一些主观的认识,可能有一些道理,也可能不对,随便看看就好: