有其他方面的论文可知,感知损失主要分为内容损失和风格损失。
其中内容损失主要是两个比较对象的L1或者l2范数。
而风格损失则主要是两个比较对象先求各自的gram矩阵,然后求L1或者l2范数。
在求gram矩阵时,可以按照以下理解:
内容content为vgg等网络提取出来的featuremap。大小为[b, h, w, c]。[批大小, 长,宽,通道数]
需要的gram矩阵由[b, c, hw] 与[b, hw, c]相乘得到为[b, c, c]
代码如下:
def ContentLoss(messageresult, compareresult):
result = 0
for x, y in zip(messageresult, compareresult):
shape = x.get_shape().as_list()
k = np.prod(shape[1:])
diff = x - y
diff = tf.norm(diff, ord=1) / k
result = result + diff
return result
# 求gram矩阵
def gram_matrix(input):
# input [batch, h, w, c]
input = tf.transpose(input, perm=[0, 3, 1, 2]) # input [batch, c, h, w]
shape = input.get_shape().as_list()
channel = shape[1]
dim = np.prod(shape[2:])
input = tf.reshape(input, [-1, channel, dim]) # input [batch, c, hw]
# k 用来进行归一化
k = channel * dim
inputtemp = tf.transpose(input, perm=[0, 2, 1]) # input [batch, hw, c]
result = tf.matmul(input, inputtemp) / k # result [batch, c, c]
return result, channel
def StyleLoss(messageresult, compareresult):
result = 0
for x, y in zip(messageresult, compareresult):
X, cx = gram_matrix(x)
Y, cy = gram_matrix(y)
if cx != cy:
print("The channel of feature map is not the same!")
return 0;
k = cx * cy
diff = X - Y
diff = tf.norm(diff, ord=1) / k # 有的文献使用1, 有的使用2
result = result + diff
return result