本文主要是对base_layer.py
代码的分析,该文件包含了最重要的Layer
类代码,keras所有的层都是Layer
的父类,class Sequential(Model)
继承了keras/engine/training.py
中的Model
类,而Model
类则继承了同目录下的keras/engine/topology.py
中的Container
类,Container
类继承了同文件中的Layer
类,所以说Layer
类就是keras的地基,承载着整个框架。
在开始讲解源码之前,需要给大家再讲一些预备知识,许多做ai的同学对python的了解可能没有那么深入,如果对这些预备知识不理解的话可能很难理解源码
装饰器是用来修改函数功能的,它能让代码变得更加简洁,在Layer
层中,采用了@property
和一个自定义的装饰器,@property
类似Java中类的变量的get
和set
方法,例如下面的源码
get方法,添加@property
即可
@property
def built(self):
return self._built
set方法,添加@属性名.setter
@built.setter
def built(self, value):
self._built = value
另一个自定义的装饰器是@interfaces.legacy_add_weight_support
,这个主要是keras2的代码对于1的兼容,这里就不做过多的讲解了。
python类中有一个函数__call__
叫做magic函数,实现了该方法的类的实例对象,可以直接以实例名作为方法进行调用,这也是为什么我们能把每一层直接连接起来
inputs = Input(shape=(100))
x = Dense(64)(inputs)
接下来我们正式来看Layer
的源码,其中截取了我认为比较重要的内容,更多细节各位直接去看源码吧,先从构造方法开始
def __init__(self, **kwargs):
self.input_spec = None
self.supports_masking = False
self.stateful = False
# These properties will be set upon call of self.build()
self._trainable_weights = []
self._non_trainable_weights = []
self._losses = []
self._updates = []
self._per_input_losses = {
}
self._per_input_updates = {
}
self._built = False
# These lists will be filled via successive calls
# to self._add_inbound_node().
self._inbound_nodes = []
self._outbound_nodes = []
# These properties should be set by the user via keyword arguments.
# note that 'dtype', 'input_shape' and 'batch_input_shape'
# are only applicable to input layers: do not pass these keywords
# to non-input layers.
allowed_kwargs = {
'input_shape',
'batch_input_shape',
'batch_size',
'dtype',
'name',
'trainable',
'weights',
'input_dtype', # legacy
}
for kwarg in kwargs:
if kwarg not in allowed_kwargs:
raise TypeError('Keyword argument not understood:', kwarg)
name = kwargs.get('name')
if not name:
prefix = self.__class__.__name__
name = _to_snake_case(prefix) + '_' + str(K.get_uid(prefix))
self.name = name
self.trainable = kwargs.get('trainable', True)
if 'input_shape' in kwargs or 'batch_input_shape' in kwargs:
# In this case we will later create an input layer
# to insert before the current layer
if 'batch_input_shape' in kwargs:
batch_input_shape = tuple(kwargs['batch_input_shape'])
elif 'input_shape' in kwargs:
batch_size = kwargs.get('batch_size')
batch_input_shape = (
batch_size,) + tuple(kwargs['input_shape'])
self.batch_input_shape = batch_input_shape
# Set dtype.
dtype = kwargs.get('dtype')
if dtype is None:
dtype = kwargs.get('input_dtype')
if dtype is None:
dtype = K.floatx()
self.dtype = dtype
self._initial_weights = kwargs.get('weights')
构造函数主要是参数的初始化和一些变量的赋值,其输入参数在allowed_kwargs
中,包括
其中dtype
, input_shape
和 batch_input_shape
是输入层才需要输入的参数,其它时候不要传。
然后我们来看add_weight
方法,该方法会给当前层添加需要训练的权重。
def add_weight(self,
name,
shape,
dtype=None,
initializer=None,
regularizer=None,
trainable=True,
constraint=None):
initializer = initializers.get(initializer)
if dtype is None:
dtype = self.dtype
weight = K.variable(initializer(shape, dtype=dtype),
dtype=dtype,
name=name,
constraint=constraint)
if regularizer is not None:
with K.name_scope('weight_regularizer'):
self.add_loss(regularizer(weight))
if trainable:
self._trainable_weights.append(weight)
else:
self._non_trainable_weights.append(weight)
return weight
其中K.variable()
方法其实就是调用的tf的tf.Variable()
方法,并判断是否有正则项,调用add_loss
方法把当前层的loss保存下来,然后根据trainable
参数判断是否是需要训练的权重,并分别添加到需要训练和不需要训练的两个列表中。
接下来我们来看最重要的magic函数__call__
,这个函数比较长,我们分几段来讲
def __call__(self, inputs, **kwargs):
if isinstance(inputs, list):
inputs = inputs[:]
with K.name_scope(self.name):
# Handle laying building (weight creating, input spec locking).
if not self.built:
# Raise exceptions in case the input is not compatible
# with the input_spec specified in the layer constructor.
self.assert_input_compatibility(inputs)
# Collect input shapes to build layer.
input_shapes = []
for x_elem in to_list(inputs):
if hasattr(x_elem, '_keras_shape'):
input_shapes.append(x_elem._keras_shape)
elif hasattr(K, 'int_shape'):
input_shapes.append(K.int_shape(x_elem))
else:
raise ValueError('You tried to call layer "' +
self.name +
'". This layer has no information'
' about its expected input shape, '
'and thus cannot be built. '
'You can build it manually via: '
'`layer.build(batch_input_shape)`')
self.build(unpack_singleton(input_shapes))
self.built = True
# Load weights that were specified at layer instantiation.
if self._initial_weights is not None:
self.set_weights(self._initial_weights)
首先,根据构造方法中的name
构造了一个scope,然后对built
参数进行了判断,如果是false,表示还没有执行过build
方法,那么先判断是否是可用的输入值,然后取得输入数据的shape并执行build
方法,build
方法是对网络结构的构造,执行完build
方法后会把built
置为true,并判断一下权重是否需要初始化。如果我们要复用一个layer,此时的built
参数是true,就不需要多次执行build
方法了,这也是为啥要构造scope的原因。
# Raise exceptions in case the input is not compatible
# with the input_spec set at build time.
self.assert_input_compatibility(inputs)
# Handle mask propagation.
previous_mask = _collect_previous_mask(inputs)
user_kwargs = kwargs.copy()
if not is_all_none(previous_mask):
# The previous layer generated a mask.
if has_arg(self.call, 'mask'):
if 'mask' not in kwargs:
# If mask is explicitly passed to __call__,
# we should override the default mask.
kwargs['mask'] = previous_mask
# Handle automatic shape inference (only useful for Theano).
input_shape = _collect_input_shape(inputs)
# Actually call the layer,
# collecting output(s), mask(s), and shape(s).
output = self.call(inputs, **kwargs)
output_mask = self.compute_mask(inputs, previous_mask)
# If the layer returns tensors from its inputs, unmodified,
# we copy them to avoid loss of tensor metadata.
output_ls = to_list(output)
inputs_ls = to_list(inputs)
output_ls_copy = []
for x in output_ls:
if x in inputs_ls:
x = K.identity(x)
output_ls_copy.append(x)
output = unpack_singleton(output_ls_copy)
这部分的代码主要的内容是把输入经过网络后得到输出结果,首先还是先判断下是否是可用的输入值,然后从输入值从判断下是否有mask,mask在后续的文章中会讲解,如果有就取出来,然后调用call
方法,得到输出值,并更新mask,之后对输入输出值做一个比较,如果输入值与输出值是一样的,则直接返回输入值,防止丢失一些元数据,其中call
方法就是网络中tensor的一个变化过程。
# Inferring the output shape is only relevant for Theano.
if all([s is not None
for s in to_list(input_shape)]):
output_shape = self.compute_output_shape(input_shape)
else:
if isinstance(input_shape, list):
output_shape = [None for _ in input_shape]
else:
output_shape = None
if (not isinstance(output_mask, (list, tuple)) and
len(output_ls) > 1):
# Augment the mask to match the length of the output.
output_mask = [output_mask] * len(output_ls)
# Add an inbound node to the layer, so that it keeps track
# of the call and of all new variables created during the call.
# This also updates the layer history of the output tensor(s).
# If the input tensor(s) had not previous Keras history,
# this does nothing.
self._add_inbound_node(input_tensors=inputs,
output_tensors=output,
input_masks=previous_mask,
output_masks=output_mask,
input_shapes=input_shape,
output_shapes=output_shape,
arguments=user_kwargs)
# Apply activity regularizer if any:
if (hasattr(self, 'activity_regularizer') and
self.activity_regularizer is not None):
with K.name_scope('activity_regularizer'):
regularization_losses = [
self.activity_regularizer(x)
for x in to_list(output)]
self.add_loss(regularization_losses,
inputs=to_list(inputs))
return output
最后这段代码就是一些善后工作了,首先根据输入shape得到输出shape,也正是在此处调用了compute_output_shape
方法,然后判断下mask和输出值维度是否相同,如果不同则把mask的维度进行一个扩展。然后调用了_add_inbound_node
方法,该方法会创建一个node(node的作用下文会讲),把不同的层连接起来,这样当前的层才能拿到之前层的一些输出值和mask。最后判断下是否需要activity_regularizer
,并添加到loss中去。kera中包括三种正则化,这里简单提一下
除了Layer
类外,这个py文件中还有两个类,一个是InputSpec
另一个是Node
,我们再一起看下这两个类是干啥的。
class InputSpec(object):
def __init__(self, dtype=None,
shape=None,
ndim=None,
max_ndim=None,
min_ndim=None,
axes=None):
self.dtype = dtype
self.shape = shape
if shape is not None:
self.ndim = len(shape)
else:
self.ndim = ndim
self.max_ndim = max_ndim
self.min_ndim = min_ndim
self.axes = axes or {
}
def __repr__(self):
spec = [('dtype=' + str(self.dtype)) if self.dtype else '',
('shape=' + str(self.shape)) if self.shape else '',
('ndim=' + str(self.ndim)) if self.ndim else '',
('max_ndim=' + str(self.max_ndim)) if self.max_ndim else '',
('min_ndim=' + str(self.min_ndim)) if self.min_ndim else '',
('axes=' + str(self.axes)) if self.axes else '']
return 'InputSpec(%s)' % ', '.join(x for x in spec if x)
这个类主要是给layer来指定ndim、dtype、shape等参数的,简单的赋值操作,没有过多复杂的语句。
Node
类是用来把两个layer连接在一起的,好比一个数据通道,传递不同的layer之间需要用到的数据,为什么要有这个Node呢,我的理解是为了解耦,Layer只关注其本身数据的处理,不关注数据的传递。回顾下Layer
类,其中有两个参数,self._inbound_nodes
与self._outbound_nodes
,每次实例化Node
的时候,就会向这两个参数中添加Node对象。
class Node(object):
def __init__(self, outbound_layer,
inbound_layers, node_indices, tensor_indices,
input_tensors, output_tensors,
input_masks, output_masks,
input_shapes, output_shapes,
arguments=None):
# Layer instance (NOT a list).
# this is the layer that takes a list of input tensors
# and turns them into a list of output tensors.
# the current node will be added to
# the inbound_nodes of outbound_layer.
self.outbound_layer = outbound_layer
# The following 3 properties describe where
# the input tensors come from: which layers,
# and for each layer, which node and which
# tensor output of each node.
# List of layer instances.
self.inbound_layers = inbound_layers
# List of integers, 1:1 mapping with inbound_layers.
self.node_indices = node_indices
# List of integers, 1:1 mapping with inbound_layers.
self.tensor_indices = tensor_indices
# Following 2 properties:
# tensor inputs and outputs of outbound_layer.
# List of tensors. 1:1 mapping with inbound_layers.
self.input_tensors = input_tensors
# List of tensors, created by outbound_layer.call().
self.output_tensors = output_tensors
# Following 2 properties: input and output masks.
# List of tensors, 1:1 mapping with input_tensor.
self.input_masks = input_masks
# List of tensors, created by outbound_layer.compute_mask().
self.output_masks = output_masks
# Following 2 properties: input and output shapes.
# List of shape tuples, shapes of input_tensors.
self.input_shapes = input_shapes
# List of shape tuples, shapes of output_tensors.
self.output_shapes = output_shapes
# Optional keyword arguments to layer's `call`.
self.arguments = arguments
# Add nodes to all layers involved.
for layer in inbound_layers:
if layer is not None:
layer._outbound_nodes.append(self)
outbound_layer._inbound_nodes.append(self)
def get_config(self):
inbound_names = []
for layer in self.inbound_layers:
if layer:
inbound_names.append(layer.name)
else:
inbound_names.append(None)
if self.outbound_layer:
outbound_layer = self.outbound_layer.name
else:
outbound_layer = None
return {
'outbound_layer': outbound_layer,
'inbound_layers': inbound_names,
'node_indices': self.node_indices,
'tensor_indices': self.tensor_indices}
Node
类代码也都是赋值操作和获取几个属性,这里关键看下Layer
中调用的_add_inbound_node
方法。
def _add_inbound_node(self, input_tensors, output_tensors,
input_masks, output_masks,
input_shapes, output_shapes, arguments=None):
input_tensors = to_list(input_tensors)
output_tensors = to_list(output_tensors)
input_masks = to_list(input_masks)
output_masks = to_list(output_masks)
input_shapes = to_list(input_shapes)
output_shapes = to_list(output_shapes)
# Collect input tensor(s) coordinates.
inbound_layers = []
node_indices = []
tensor_indices = []
for x in input_tensors:
if hasattr(x, '_keras_history'):
inbound_layer, node_index, tensor_index = x._keras_history
inbound_layers.append(inbound_layer)
node_indices.append(node_index)
tensor_indices.append(tensor_index)
else:
inbound_layers.append(None)
node_indices.append(None)
tensor_indices.append(None)
# Create node, add it to inbound nodes.
Node(
self,
inbound_layers=inbound_layers,
node_indices=node_indices,
tensor_indices=tensor_indices,
input_tensors=input_tensors,
output_tensors=output_tensors,
input_masks=input_masks,
output_masks=output_masks,
input_shapes=input_shapes,
output_shapes=output_shapes,
arguments=arguments
)
# Update tensor history, _keras_shape and _uses_learning_phase.
for i in range(len(output_tensors)):
output_tensors[i]._keras_shape = output_shapes[i]
uses_lp = any(
[getattr(x, '_uses_learning_phase', False)
for x in input_tensors])
uses_lp = getattr(self, 'uses_learning_phase', False) or uses_lp
output_tensors[i]._uses_learning_phase = getattr(
output_tensors[i], '_uses_learning_phase', False) or uses_lp
output_tensors[i]._keras_history = (self,
len(self._inbound_nodes) - 1,
i)
首先确保输入输出的tensor、mask、shape都是list,如果不是的话就转成list格式,然后遍历input_tensors
获取初始化Node
需要的参数,接下来就能创建Node,把不同的layer连在一起,最后把output_tensors
进行更新。
这些就是Layer
的全部内容了,下一篇博客会带着大家分析下我们在定义模型时常会用到的Input