本文给出pytorch完整代码实现deep dream,加入了图像金字塔处理和高斯平滑处理,使生成图更加清晰美观。文中还讨论了各种因素对生成图的影响。
Deep dream图是使某层输出特征最大的生成输入图像,可以认为是特征的可视化图,可以帮助我们理解每层网络学到了什么东西,deep dream图也很好玩,可以生成一些漂亮有趣的图片。在上一篇文章中我们已经就主要原理进行了介绍,但是仅用简单方法生成的图不美观。我这次加入了在github上找到的图像金字塔处理和高斯平滑处理代码,使生成结果更加漂亮了。
import torch
import torchvision.models as models
import torch.nn.functional as F
import torch.nn as nn
import numpy as np
import numbers
import math
import cv2
from PIL import Image
from torchvision.transforms import Compose, ToTensor, Normalize, Resize, ToPILImage
import time
t0 = time.time()
model = models.vgg16(pretrained=True).cuda()
batch_size = 1
for params in model.parameters():
params.requires_grad = False
model.eval()
mu = torch.Tensor([0.485, 0.456, 0.406]).unsqueeze(-1).unsqueeze(-1).cuda()
std = torch.Tensor([0.229, 0.224, 0.225]).unsqueeze(-1).unsqueeze(-1).cuda()
unnormalize = lambda x: x*std + mu
normalize = lambda x: (x-mu)/std
transform_test = Compose([
Resize((500,600)),
ToTensor(),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
class CascadeGaussianSmoothing(nn.Module):
"""
Apply gaussian smoothing separately for each channel (depthwise convolution).
Arguments:
kernel_size (int, sequence): Size of the gaussian kernel.
sigma (float, sequence): Standard deviation of the gaussian kernel.
"""
def __init__(self, kernel_size, sigma):
super().__init__()
if isinstance(kernel_size, numbers.Number):
kernel_size = [kernel_size, kernel_size]
cascade_coefficients = [0.5, 1.0, 2.0] # std multipliers, hardcoded to use 3 different Gaussian kernels
sigmas = [[coeff * sigma, coeff * sigma] for coeff in cascade_coefficients] # isotropic Gaussian
self.pad = int(kernel_size[0] / 2) # assure we have the same spatial resolution
# The gaussian kernel is the product of the gaussian function of each dimension.
kernels = []
meshgrids = torch.meshgrid([torch.arange(size, dtype=torch.float32) for size in kernel_size])
for sigma in sigmas:
kernel = torch.ones_like(meshgrids[0])
for size_1d, std_1d, grid in zip(kernel_size, sigma, meshgrids):
mean = (size_1d - 1) / 2
kernel *= 1 / (std_1d * math.sqrt(2 * math.pi)) * torch.exp(-((grid - mean) / std_1d) ** 2 / 2)
kernels.append(kernel)
gaussian_kernels = []
for kernel in kernels:
# Normalize - make sure sum of values in gaussian kernel equals 1.
kernel = kernel / torch.sum(kernel)
# Reshape to depthwise convolutional weight
kernel = kernel.view(1, 1, *kernel.shape)
kernel = kernel.repeat(3, 1, 1, 1)
kernel = kernel.cuda()
gaussian_kernels.append(kernel)
self.weight1 = gaussian_kernels[0]
self.weight2 = gaussian_kernels[1]
self.weight3 = gaussian_kernels[2]
self.conv = F.conv2d
def forward(self, input):
input = F.pad(input, [self.pad, self.pad, self.pad, self.pad], mode='reflect')
# Apply Gaussian kernels depthwise over the input (hence groups equals the number of input channels)
# shape = (1, 3, H, W) -> (1, 3, H, W)
num_in_channels = input.shape[1]
grad1 = self.conv(input, weight=self.weight1, groups=num_in_channels)
grad2 = self.conv(input, weight=self.weight2, groups=num_in_channels)
grad3 = self.conv(input, weight=self.weight3, groups=num_in_channels)
return (grad1 + grad2 + grad3) / 3
#data = torch.ones(batch_size,3,500,600).cuda()*0.5
#data = normalize(data)
n = 0 #某层特征的第n个通道
data = Image.open('./feature_visual/gray.jpg')#使用一张初始图片
data = transform_test(data).unsqueeze(0).cuda()
H,W = data.shape[2],data.shape[3]
#data.requires_grad=True
input_tensor = data.clone()
def hook(module,inp,out):
global features
features = out
myhook = model.features[22].register_forward_hook(hook)
levels, ratio = 4, 1.8
lr=0.2
IMAGENET_MEAN_1 = np.array([0.485, 0.456, 0.406], dtype=np.float32)
IMAGENET_STD_1 = np.array([0.229, 0.224, 0.225], dtype=np.float32)
LOWER_IMAGE_BOUND = torch.tensor((-IMAGENET_MEAN_1 / IMAGENET_STD_1).reshape(1, -1, 1, 1)).cuda()
UPPER_IMAGE_BOUND = torch.tensor(((1 - IMAGENET_MEAN_1) / IMAGENET_STD_1).reshape(1, -1, 1, 1)).cuda()
for pyramid_level in range(levels):#使用图像金字塔方法,逐渐放大分辨率
data = input_tensor.detach()
h = int(np.round(H*(1.8**(pyramid_level - levels + 1))))
w = int(np.round(W*(1.8**(pyramid_level - levels + 1))))
input_tensor = F.interpolate(data,(h,w),mode='bilinear')
input_tensor.requires_grad=True
for i in range(20):
_ = model(input_tensor)
loss = features[:,n,:,:].mean() #指定通道
#loss = features.mean() #指定层
loss.backward()
grad = input_tensor.grad
sigma = ((i + 1) / 20) * 2.0 + 0.5
smooth_grad = CascadeGaussianSmoothing(kernel_size=9, sigma=sigma)(grad) # "magic number" 9 just works well
g_std = torch.std(smooth_grad)
g_mean = torch.mean(smooth_grad)
smooth_grad = smooth_grad - g_mean
smooth_grad = smooth_grad / g_std
input_tensor.data += lr * smooth_grad
input_tensor.grad.zero_()
input_tensor.data = torch.max(torch.min(input_tensor, UPPER_IMAGE_BOUND), LOWER_IMAGE_BOUND)
print('data.mean():',input_tensor.mean().item(),input_tensor.std().item())
print('loss:',loss.item())
print('time: %.2f'%(time.time()-t0))
myhook.remove()
data_i = input_tensor.clone()
data_i = unnormalize(data_i)
data_i = torch.clamp(data_i,0,1)
data_i = data_i[0].permute(1,2,0).data.cpu().numpy()*255
data_i = data_i[...,::-1].astype('uint8') #注意cv2使用BGR顺序
cv2.imwrite('./feature_visual/densenet161block3denselayer36relu2/filter%d.jpg'%n,data_i)
可以看出,效果非常惊艳!下图给出网络不同层的deep dream图:
浅层输出生成的deep dream图呈现出一些简单的条纹和色彩特征,随着网络加深,生成的deep dream图逐渐变得丰富。最漂亮的一般是中间偏后的层,例如上面的block3,再往后的层,信息会更丰富但也更杂乱一些,视觉效果不是很好。
如果以最终输出的类别概率的最大化作为损失,也可以生成输入图像,这种可以称之为类印象图,见图3。类印象图的概念在我另一篇文章中有介绍,本文使用了图像金字塔和高斯平滑后观感效果要更好。
使用这种方法可以生成任意尺寸的图片,如图4是1024x1024的,也非常清晰。
使用图像金字塔的作用是使图像中出现有大有小不同尺寸的纹路,使图片显得有层次感,看起来更加美观而不单调。
不使用高斯平滑时生成图会有一些高频噪点,显得图像不够清晰,使用高斯平滑可以去掉。
初始图像好比是一颗种子,随着每次迭代生成图在初始图的基础上不断发展,初始图的轮廓则在生成图中持续存在,这一点很有趣,使我们可以更灵活的按照需要引导生成图。