Keras是一个高度封装的库,它的优点是可以进行快速的建模,缺点是它不处理底层运算,如张量内积等。为了弥补这个问题,Keras提供“后端引擎”来实现底层运算操作。目前Keras支持的后端引擎有tensorflow,CNTK,Theano。默认的是使用tensorflow,你可以在.keras/keras.json文件中更改backend。我们可以使用keras提供的后端来实现任意你想实现的layer。
我们先来看下keras官方给的示例:
只有一个张量输入输出:
from keras import backend as K
from keras.layers import Layer
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return K.dot(x, self.kernel)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
有多个张量输入输出时:
from keras import backend as K
from keras.layers import Layer
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
assert isinstance(input_shape, list)
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[0][1], self.output_dim),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
assert isinstance(x, list)
a, b = x
return [K.dot(a, self.kernel) + b, K.mean(b, axis=-1)]
def compute_output_shape(self, input_shape):
assert isinstance(input_shape, list)
shape_a, shape_b = input_shape
return [(shape_a[0], self.output_dim), shape_b[:-1]]
这里主要分析有一个张量输入输出时,因为多个也是类似的。
基本的需要实现以下四个方法:
1. __init __ (self, output_dim, **kwargs) :这个方法是用来初始化并自定义自定义层所需的属性,比如output_dim,以及一个必需要执行的super().__init __(**kwargs),这行代码是去执行Layer类中的初始化函数,当它执行了你就没有必要去管input_shape,weights,trainable等关键字参数了因为父类(Layer)的初始化函数实现了它们与layer实例的绑定。
2. build(self, input_shape): 创建层权重的地方。你可以通过Layer类的add_weight方法来自定义并添加一个权重矩阵。这个方法一定有input_shape参数。这个方法必须设self.built = True,目的是为了保证这个层的权重定义函数build被执行过了,这个self.built其实是个标记而已,当然也可以通过调用super([MyLayer], self).build(input_shape)来完成。build这个方法是用来创建权重的,在这个函数中我们需要说明这个权重各方面的属性比如shape,初始化方式以及可训练性等信息,这也是为什么keras设计单独的一个方法来定义权重。
3. call(self, x): 这里是编写层的功能逻辑的地方。你只需要关注传入call的第一个参数:输入张量x,而且它只能是一种形式不能是具体的变量也就是它说它不能被定义。如果你希望你的层能支持masking,我们建议直接使用官方给的Masking层即可。这个call函数就是该层的计算逻辑,或计算图了。显然,这个层的核心应该是一段符号式的输入张量到输出张量的计算过程。再次强调因为输入只是个形式,所以输入变量不能被事先定义。这个跟python中的匿名函数类似,在python中没有被赋过值的变量就是未定义的。
4. compute_output_shape(self, input_shape):为了能让Keras内部shape的匹配检查通过,这里需要重写compute_output_shape方法去覆盖父类中的同名方法,来保证输出shape是正确的。父类Layer中的compute_output_shape方法直接返回的是input_shape这明显是不对的,所以需要我们重写这个方法。所以这个方法也是4个要实现的基本方法之一。
当然你还可以根据需要自定义实现一些其他的方法。
class AttentionDecoder(Recurrent):
def __init__(self, units, output_dim,
activation='tanh',
return_probabilities=False,
name='AttentionDecoder',
kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs):
"""
Implements an AttentionDecoder that takes in a sequence encoded by an
encoder and outputs the decoded states
:param units: dimension of the hidden state and the attention matrices
:param output_dim: the number of labels in the output space
references:
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio.
"Neural machine translation by jointly learning to align and translate."
arXiv preprint arXiv:1409.0473 (2014).
"""
self.units = units
self.output_dim = output_dim
self.return_probabilities = return_probabilities
self.activation = activations.get(activation)
self.kernel_initializer = initializers.get(kernel_initializer)
self.recurrent_initializer = initializers.get(recurrent_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.recurrent_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.recurrent_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
super(AttentionDecoder, self).__init__(**kwargs) # 调用父类的__init__(**kwargs)从而为自定义层注入被允许的关键字参数如:input_shape,weights,trainable
self.name = name
self.return_sequences = True # must return sequences
def build(self, input_shape):
"""
See Appendix 2 of Bahdanau 2014, arXiv:1409.0473
for model details that correspond to the matrices here.
"""
self.batch_size, self.timesteps, self.input_dim = input_shape
if self.stateful:
super(AttentionDecoder, self).reset_states()
self.states = [None, None] # y, s
"""
Matrices for creating the context vector
"""
self.V_a = self.add_weight(shape=(self.units,),
name='V_a',
initializer=self.kernel_initializer,
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
self.W_a = self.add_weight(shape=(self.units, self.units),
name='W_a',
initializer=self.kernel_initializer,
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
self.U_a = self.add_weight(shape=(self.input_dim, self.units),
name='U_a',
initializer=self.kernel_initializer,
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
self.b_a = self.add_weight(shape=(self.units,),
name='b_a',
initializer=self.bias_initializer,
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
"""
Matrices for the r (reset) gate
"""
self.C_r = self.add_weight(shape=(self.input_dim, self.units),
name='C_r',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.U_r = self.add_weight(shape=(self.units, self.units),
name='U_r',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.W_r = self.add_weight(shape=(self.output_dim, self.units),
name='W_r',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.b_r = self.add_weight(shape=(self.units, ),
name='b_r',
initializer=self.bias_initializer,
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
"""
Matrices for the z (update) gate
"""
self.C_z = self.add_weight(shape=(self.input_dim, self.units),
name='C_z',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.U_z = self.add_weight(shape=(self.units, self.units),
name='U_z',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.W_z = self.add_weight(shape=(self.output_dim, self.units),
name='W_z',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.b_z = self.add_weight(shape=(self.units, ),
name='b_z',
initializer=self.bias_initializer,
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
"""
Matrices for the proposal
"""
self.C_p = self.add_weight(shape=(self.input_dim, self.units),
name='C_p',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.U_p = self.add_weight(shape=(self.units, self.units),
name='U_p',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.W_p = self.add_weight(shape=(self.output_dim, self.units),
name='W_p',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.b_p = self.add_weight(shape=(self.units, ),
name='b_p',
initializer=self.bias_initializer,
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
"""
Matrices for making the final prediction vector
"""
self.C_o = self.add_weight(shape=(self.input_dim, self.output_dim),
name='C_o',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.U_o = self.add_weight(shape=(self.units, self.output_dim),
name='U_o',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.W_o = self.add_weight(shape=(self.output_dim, self.output_dim),
name='W_o',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.b_o = self.add_weight(shape=(self.output_dim, ),
name='b_o',
initializer=self.bias_initializer,
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
# For creating the initial state:
self.W_s = self.add_weight(shape=(self.input_dim, self.units),
name='W_s',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
self.input_spec = [
InputSpec(shape=(self.batch_size, self.timesteps, self.input_dim))]
self.built = True # 一定要设置,从而保证build函数是执行过了的
def call(self, x):
# store the whole sequence so we can "attend" to it at each timestep
self.x_seq = x
# apply the a dense layer over the time dimension of the sequence
# do it here because it doesn't depend on any previous steps
# thefore we can save computation time:
self._uxpb = _time_distributed_dense(self.x_seq, self.U_a, b=self.b_a,
input_dim=self.input_dim,
timesteps=self.timesteps,
output_dim=self.units)
return super(AttentionDecoder, self).call(x)
def get_initial_state(self, inputs):
# apply the matrix on the first time step to get the initial s0.
s0 = activations.tanh(K.dot(inputs[:, 0], self.W_s))
# from keras.layers.recurrent to initialize a vector of (batchsize,
# output_dim)
y0 = K.zeros_like(inputs) # (samples, timesteps, input_dims)
y0 = K.sum(y0, axis=(1, 2)) # (samples, )
y0 = K.expand_dims(y0) # (samples, 1)
y0 = K.tile(y0, [1, self.output_dim]) # 将y0这片瓦片重复盖瓦
return [y0, s0]
def step(self, x, states):
ytm, stm = states
# repeat the hidden state to the length of the sequence
_stm = K.repeat(stm, self.timesteps) # 复制张量并填充在新维度上,从而升维
# now multiplty the weight matrix with the repeated hidden state
_Wxstm = K.dot(_stm, self.W_a)
# calculate the attention probabilities
# this relates how much other timesteps contributed to this one.
et = K.dot(activations.tanh(_Wxstm + self._uxpb),
K.expand_dims(self.V_a))
at = K.exp(et)
at_sum = K.sum(at, axis=1)
at_sum_repeated = K.repeat(at_sum, self.timesteps)
at /= at_sum_repeated # vector of size (batchsize, timesteps, 1)
# calculate the context vector
context = K.squeeze(K.batch_dot(at, self.x_seq, axes=1), axis=1)
# ~~~> calculate new hidden state
# first calculate the "r" gate:
rt = activations.sigmoid(
K.dot(ytm, self.W_r)
+ K.dot(stm, self.U_r)
+ K.dot(context, self.C_r)
+ self.b_r)
# now calculate the "z" gate
zt = activations.sigmoid(
K.dot(ytm, self.W_z)
+ K.dot(stm, self.U_z)
+ K.dot(context, self.C_z)
+ self.b_z)
# calculate the proposal hidden state:
s_tp = activations.tanh(
K.dot(ytm, self.W_p)
+ K.dot((rt * stm), self.U_p)
+ K.dot(context, self.C_p)
+ self.b_p)
# new hidden state:
st = (1-zt)*stm + zt * s_tp
yt = activations.softmax(
K.dot(ytm, self.W_o)
+ K.dot(stm, self.U_o)
+ K.dot(context, self.C_o)
+ self.b_o)
if self.return_probabilities:
return at, [yt, st]
else:
return yt, [yt, st]
def compute_output_shape(self, input_shape):
"""
For Keras internal compatability checking
"""
if self.return_probabilities:
return (None, self.timesteps, self.timesteps)
else:
return (None, self.timesteps, self.output_dim)
def get_config(self):
"""
For rebuilding models on load time.
"""
config = {
'output_dim': self.output_dim,
'units': self.units,
'return_probabilities': self.return_probabilities
}
base_config = super(AttentionDecoder, self).get_config() # 重写get_config,加了些新配置信息也保留了父类中get_config能获取到的配置信息
return dict(list(base_config.items()) + list(config.items()))
参考:https://keras.io/layers/writing-your-own-keras-layers/