CNN with Estimator官网教程:https://tensorflow.google.cn/tutorials/estimators/cnn
自定义Estimator指南:https://www.tensorflow.org/guide/custom_estimators
利用Estimator搭建神经网络模型一般需要定义五个函数,他们分别是:
- define_flags() : 用于创建必要的参数,该函数利用自定义的flags_core模块一键创建一些运行model所需要的基本参数(如model_dir,train_epoch等)如果你想增加自定义的参数(如dropout_rate),可以使用absl.flags模块
- input_fn() : 用于将数据转化成dataset,run_loop中调用模型三个子功能的时候会用到(如果你的数据集是csv格式的,还需要定义feature_column()函数来帮助networks()构建输入层input_layer,关于数据读取可参考这篇)
- networks(input,label,params...) : 用于构建神经网络的结构层次(layers), 函数输出是模型得到的逻辑值
- model_fn(features,labels,mode,params) : 用于实现模型的训练(train)和评估(eval)、预测(predict)三个具体功能,通过调用networks()返回的logits做进一步处理 [ 注意该函数的四个参数是固定的,你不能自己随便定义新参数 ]
- run_loop(flags) : 将模型实例化,调用模型的训练(train)和评估(eval)、预测(predict)三个功能
需要用到的包:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
import numpy as np
import absl
from absl import app as absl_app
from absl import flags
from official.utils.flags import core as flags_core
from official.utils.flags._conventions import help_wrap
from official.utils.logs import hooks_helper
from official.utils.misc import model_helpers
from official.mnist import dataset
from tensorflow.contrib.estimator import stop_if_no_decrease_hook
import os
def define_flags():
flags_core.define_base()
flags_core.define_image()
flags_core.define_performance()
flags.adopt_module_key_flags(flags_core)
flags.DEFINE_float(name='dropout_rate',default=0.4,help=help_wrap("dropout rate"))
flags.DEFINE_float(name='learning_rate', default=0.001, help=help_wrap("learning rate"))
#对一些特定参数设定默认值
flags_core.set_defaults(data_dir='D:/pycharm_proj/tensorflow/dataset/mnist_data',
model_dir='D:/pycharm_proj/tensorflow/model/mnist_model_estimator_l2_earlyStop',
batch_size=100,
train_epochs=40,
data_format="channels_last",
dropout_rate=0.4)
补充:
absl.flags模块用法:https://blog.csdn.net/qq_33757398/article/details/82491411
flags_core模块源码:https://github.com/tensorflow/models/tree/master/official/utils/flags
def train_input_fn():#这里虽然返回的是一个ds但是实际上这个是被zip(feature,label)的ds,可以直接被parse成feature,label [也就是 model.train中需要input_fn返回的形式]
ds = dataset.train(params.data_dir)
ds = ds.cache().shuffle(buffer_size=50000).batch(params.batch_size)
ds = ds.repeat(params.epochs_between_evals)
return ds
def eval_input_fn():
return dataset.test(params.data_dir).batch(
params.batch_size).make_one_shot_iterator().get_next()
补充:
dataset.train源码:https://github.com/tensorflow/models/blob/master/official/mnist/dataset.py
def networks(features,params,alpha=0,reuse=False,is_train=False):
with tf.variable_scope('ConvNet', reuse=reuse):
# tf.logging.INFO('building_model')
# 卷积池化层开始 也就是CNN中的C
# reshape
input_layer = tf.reshape(features, [-1, 28, 28, 1]) # 可能数据本来就是这种形式了,只不过这里声明一下方便后面建模的时候自动获取维度??
# conv
conv1 = tf.layers.conv2d(inputs=input_layer, filters=32, kernel_size=5, padding='same',
activation=tf.nn.relu) # stride默认是[1,1]也就是每次移动一个格,所以最后卷积之后得到的新图尺寸和原来相同
# pool
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
# conv
conv2 = tf.layers.conv2d(inputs=pool1, filters=64, kernel_size=5, padding='same', activation=tf.nn.relu)
# pool
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
# 全连接层开始 也就是CNN中的NN
'''
全连接层(fully connected layers,FC)在整个卷积神经网络中起到“分类器”的作用。如果说卷积层、池化层和激活函数层等操作是将原始数据映射到隐层特征空间的话,
全连接层则起到将学到的“分布式特征表示”映射到样本标记空间的作用。在实际使用中,全连接层可由卷积操作实现:对前层是全连接的全连接层可以转化为卷积核为1x1的卷积;
而前层是卷积层的全连接层可以转化为卷积核为hxw的全局卷积,h和w分别为前层卷积结果的高和宽
全连接就是个矩阵乘法,相当于一个特征空间变换,可以把前面所有有用的信息提取整合。再加上激活函数的非线性映射,多层全连接层理论上可以模拟任何非线性变换。但缺点也很明显: 无法保持空间结构。
全连接的一个作用是维度变换,尤其是可以把高维变到低维,同时把有用的信息保留下来。全连接另一个作用是隐含语义的表达(embedding),把原始特征映射到各个隐语义节点(hidden node)。
对于最后一层全连接而言,就是分类的显示表达。不同channel同一位置上的全连接等价与1x1的卷积。N个节点的全连接可近似为N个模板卷积后的均值池化(GAP)。
'''
# flat
# pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])#flat层的使用需要自己算展开后的维度?也许可以用
pool2_flat = tf.contrib.layers.flatten(pool2) #tf.contrib.layers.flatten(P)这个函数就是把P保留第一个维度,把第一个维度包含的每一子张量展开成一个行向量,返回张量是一个二维的, shape=(batch_size,….),一般用于卷积神经网络全链接层前的预处理
# dense
dense1 = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
#在dropout之前进行正则化
if alpha != 0 :
dense1 = alpha * tf.divide(dense1,tf.norm(dense1,ord='euclidean')) #欧几里得范数
# dropout 这个不应该是用在整个网络中的吗?为什么可以单独来一层?也许放在最后就是为了对前面所有层进行dropout??
#解答:CNN中真正属于神经网络的是NN部分,前面的conv和pool都属于C,在这个卷积神经网络中,NN只有一层,,,也就是上面的dense1(后面的logits的dense是输出层,对于输出层不能使用dropout),,,因此确实是一层nn对应一层dropout
dropout = tf.layers.dropout(inputs=dense1, rate=params['dropout_rate'], training=is_train)
# logits
logits = tf.layers.dense(inputs=dropout,units=10) # 注意logit层不需要设置激活函数(后面会用到softmax来‘激活’)因为这里还需要raw data来决定类别(激活后变成了可能性)
return logits
补充:
如何构建CNN的各个层:https://tensorflow.google.cn/tutorials/estimators/cnn#building_the_cnn_mnist_classifier
CNN原理的理解:https://cs231n.github.io/convolutional-networks/#conv
使用TensorFlow实现L2正则化约束的softmax损失函数:https://blog.csdn.net/CoderPai/article/details/78931377
def cnn_model_fn(features,labels,mode,params): #参数必须得用features labels不能用其他名字
'''
activation方法在tf.nn中
loss计算方法在tf.losses中
layer定义方法在tf.layers中
acc计算方法在tf.metrics中
optimizer定义方法在tf.train中
其他工具都直接在tf中
:param data:
:param label:
:param mode:
:return:
'''
#logits
alpha = 30
# Because Dropout have different behavior at training and prediction time, we
# need to create 2 distinct computation graphs that still share the same weights.
logits_l2 = networks(features, params=params, reuse=False, alpha=alpha,is_train=True)
# At test time we don't need to normalize or scale, it's redundant as per paper : https://arxiv.org/abs/1703.09507
logits = networks(features, params=params, reuse=True, alpha=0,is_train=False)#eval和test时需要用到
#prediction
prediction={'Pclass':tf.argmax(input=logits,axis=1,name='classes'),'prob':tf.nn.softmax(logits=logits,name='softmax_tensor')}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode,predictions=prediction)#注意这里传的是整个预测dic,下面eval传的只是prediction中的class
'''
EstimatorSpec有如下参数:
def __new__(cls,
mode, #三种都要指定
predictions=None, #PREDICT指定
loss=None, #TRAIN 和 EVAL指定
train_op=None, #TRAIN指定
eval_metric_ops=None, #EVAL指定
export_outputs=None,
training_chief_hooks=None,
training_hooks=None,
scaffold=None,
evaluation_hooks=None,
prediction_hooks=None):
"""Creates a validated `EstimatorSpec` instance.
Depending on the value of `mode`, different arguments are required. Namely
* For `mode == ModeKeys.TRAIN`: required fields are `loss` and `train_op`.
* For `mode == ModeKeys.EVAL`: required field is `loss`.
* For `mode == ModeKeys.PREDICT`: required fields are `predictions`.
model_fn can populate all arguments independent of mode. In this case, some
arguments will be ignored by an `Estimator`. E.g. `train_op` will be
ignored in eval and infer modes. Example:
```python
def my_model_fn(features, labels, mode):
predictions = ...
loss = ...
train_op = ...
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train_op)
```
Alternatively, model_fn can just populate the arguments appropriate to the
given mode. Example:
```python
def my_model_fn(features, labels, mode):
if (mode == tf.estimator.ModeKeys.TRAIN or
mode == tf.estimator.ModeKeys.EVAL):
loss = ...
else:
loss = None
if mode == tf.estimator.ModeKeys.TRAIN:
train_op = ...
else:
train_op = None
if mode == tf.estimator.ModeKeys.PREDICT:
predictions = ...
else:
predictions = None
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train_op)
```
Args:
mode: A `ModeKeys`. Specifies if this is training, evaluation or
prediction.
predictions: Predictions `Tensor` or dict of `Tensor`.
loss: Training loss `Tensor`. Must be either scalar, or with shape `[1]`.
train_op: Op for the training step.
eval_metric_ops: Dict of metric results keyed by name.
The values of the dict can be one of the following:
(1) instance of `Metric` class.
(2) Results of calling a metric function, namely a
`(metric_tensor, update_op)` tuple. `metric_tensor` should be
evaluated without any impact on state (typically is a pure computation
results based on variables.). For example, it should not trigger the
`update_op` or requires any input fetching.
export_outputs: Describes the output signatures to be exported to
`SavedModel` and used during serving.
A dict `{name: output}` where:
* name: An arbitrary name for this output.
* output: an `ExportOutput` object such as `ClassificationOutput`,
`RegressionOutput`, or `PredictOutput`.
Single-headed models only need to specify one entry in this dictionary.
Multi-headed models should specify one entry for each head, one of
which must be named using
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY.
If no entry is provided, a default `PredictOutput` mapping to
`predictions` will be created.
training_chief_hooks: Iterable of `tf.train.SessionRunHook` objects to
run on the chief worker during training.
training_hooks: Iterable of `tf.train.SessionRunHook` objects to run
on all workers during training.
scaffold: A `tf.train.Scaffold` object that can be used to set
initialization, saver, and more to be used in training.
evaluation_hooks: Iterable of `tf.train.SessionRunHook` objects to
run during evaluation.
prediction_hooks: Iterable of `tf.train.SessionRunHook` objects to
run during predictions.
Returns:
A validated `EstimatorSpec` object.
Raises:
ValueError: If validation fails.
TypeError: If any of the arguments is not the expected type.
"""
'''
#tf.summary.scalar('cross_entropy_test', loss)
if mode == tf.estimator.ModeKeys.TRAIN:
loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(labels=labels,
logits=logits_l2),name='train_loss') # sparse是指传入的logits会被进一步处理为tf.argmax()
opt = tf.train.AdamOptimizer(learning_rate=0.001)
opt_op = opt.minimize(loss=loss,global_step=tf.train.get_global_step())
#model.train()里传入了hook,所以下面的identity需要对hook勾取的信息进行说明
# 控制台输出日志信息,hook用来声明输出什么信息,下面的则是在构建graph的时候(通过创建一个节点来)指明这些信息的值来源于那些变量或者op(不这样做会报错,和hook传给model的信息不符or缺失)
# 这样只是能在控制台中显示,想要在tensorboard中查看还要tf.summary.scalar()
# accuracys = tf.metrics.accuracy(
# labels=labels, predictions=prediction['Pclass'])
# tf.identity(params['learning_rate'], 'learning_rate')
# tf.identity(loss, 'cross_entropy') # loss是带l2的loss,是真正的loss
# tf.identity(accuracys[1], name='train_accuracy') # acc是不带l2的acc,相当于对train的真实predict
#tf.summary.scalar('train_accuracy', accuracys[1]) #tensorboard中最后一个图是这个函数画的,其他的loss图acc图都是
#tf.summary.scalar('cross_entropy',loss) #这个的输出是只有train_loss
return tf.estimator.EstimatorSpec(mode=mode,loss=loss,train_op=opt_op)
#eval_metrics
eval_metric_ops={'accuracy':tf.metrics.accuracy(labels=labels,predictions=prediction['Pclass'])}
loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits_l2),name='eval_loss') # sparse是指传入的logits会被进一步处理为tf.argmax()
return tf.estimator.EstimatorSpec(mode=mode,loss=loss,eval_metric_ops=eval_metric_ops)#评估当然既需要acc也需要loss,前者由eval_metric_ops提供,所以不再需要prediction参数
#train和eval都有loss传入,然后EstimatorSpec自动将他们写进events.out.tfevents.xxx(train的loss在在model_dir中,eval的loss在model_dir/eval中)两者的loss在可以同一个图中显示,根据文件夹名称不同会显示不同颜色
# 此外eval还传入了eval_metric_ops,这个也会被自动加入hook,从而被记录在events.out.tfevents.xxx中,所以tb中也有一个叫accuracy的图
# train中根据具体实现还会记录global_step,且图名就是global_step
'''
def _train_with_estimator_spec(self, estimator_spec, worker_hooks, hooks,global_step_tensor, saving_listeners):
"""Train a model with the given Estimator Spec."""
if self._warm_start_settings:
logging.info('Warm-starting with WarmStartSettings: %s' %
(self._warm_start_settings,))
warm_starting_util.warm_start(*self._warm_start_settings)
# Check if the user created a loss summary, and add one if they didn't.
# We assume here that the summary is called 'loss'. If it is not, we will
# make another one with the name 'loss' to ensure it shows up in the right
# graph in TensorBoard.
if not any([x.op.name == 'loss'
for x in ops.get_collection(ops.GraphKeys.SUMMARIES)]):
summary.scalar('loss', estimator_spec.loss)
ops.add_to_collection(ops.GraphKeys.LOSSES, estimator_spec.loss)
worker_hooks.extend(hooks)
worker_hooks.append(
training.NanTensorHook(estimator_spec.loss)
)
if self._config.log_step_count_steps is not None:
worker_hooks.append(
training.LoggingTensorHook(
{
'loss': estimator_spec.loss,
'step': global_step_tensor
},
every_n_iter=self._config.log_step_count_steps)
)
worker_hooks.extend(estimator_spec.training_hooks)
if not (estimator_spec.scaffold.saver or
ops.get_collection(ops.GraphKeys.SAVERS)):
ops.add_to_collection(
ops.GraphKeys.SAVERS,
training.Saver(
sharded=True,
max_to_keep=self._config.keep_checkpoint_max,
keep_checkpoint_every_n_hours=(
self._config.keep_checkpoint_every_n_hours),
defer_build=True,
save_relative_paths=True))
chief_hooks = []
all_hooks = worker_hooks + list(estimator_spec.training_chief_hooks)
saver_hooks = [
h for h in all_hooks if isinstance(h, training.CheckpointSaverHook)]
if (self._config.save_checkpoints_secs or
self._config.save_checkpoints_steps):
if not saver_hooks:
chief_hooks = [
training.CheckpointSaverHook(
self._model_dir,
save_secs=self._config.save_checkpoints_secs,
save_steps=self._config.save_checkpoints_steps,
scaffold=estimator_spec.scaffold)
]
saver_hooks = [chief_hooks[0]]
if saving_listeners:
if not saver_hooks:
raise ValueError(
'There should be a CheckpointSaverHook to use saving_listeners. '
'Please set one of the RunConfig.save_checkpoints_steps or '
'RunConfig.save_checkpoints_secs.')
else:
# It is expected to have one CheckpointSaverHook. If multiple, we pick
# up the first one to add listener.
saver_hooks[0]._listeners.extend(saving_listeners) # pylint: disable=protected-access
with training.MonitoredTrainingSession(
master=self._config.master,
is_chief=self._config.is_chief,
checkpoint_dir=self._model_dir,
scaffold=estimator_spec.scaffold,
hooks=worker_hooks,
chief_only_hooks=(
tuple(chief_hooks) + tuple(estimator_spec.training_chief_hooks)),
save_checkpoint_secs=0, # Saving is handled by a hook.
save_summaries_steps=self._config.save_summary_steps,
config=self._session_config,
log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
loss = None
while not mon_sess.should_stop():
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
return loss
'''
#而在model_fn的train中定义的三个identity和summary则是额外传入日志文件(显式传入而不是像前面那样隐式写入)的,图名就是summary的第一个参数
补充:
Estimator是个高级的API了,他能帮你自动实现loss的acc的记录,从而在tensorboard中查看他们,本模型的TB如下图所示:
如果通过TensorBoard调参可以参考:https://www.jianshu.com/p/d059ffea9ec0
如果你想对自己中意的变量进行summary(Estimator只对loss,step,acc进行自动记录),以便在tensorboard中查看,需要先在run_loop()中实例化一些hooks(以待调用模型的功能时当做参数传入),然后在model_fn中显式地声明hooks钩取的变量来自哪( tf.identity(相关变量) )以及写入event日志( tf.summary.scalar(相关变量) )
def run_mnist(params):
model_helpers.apply_clean(params) # 清空model_dir文件夹下的旧文件
#实例化estimator
paramsdic = params.flag_values_dict()
model = tf.estimator.Estimator(model_fn=cnn_model_fn,model_dir=params.model_dir,params=paramsdic) #Estimator的构造函数会把params传给model_fn
#为啥不能params=params??因为传入的params是一个类!!!absl.flags._flagvalues.FlagValues类,需要调用函数flag_values_dict()将他的属性转化成dic才能被传入model_fn
#没转化成dic时用params.dropout_rate 代表取出属性
#转化成dic后用params['dropout_rate'] #代表取出key对应的value
#实例化hooks(用于监控台输出程序运行的记录日志,记录哪些量由tensor_to_log字典给出)而tensorboard的图似乎和hook没关系?
tensor_to_log={'prob':'softmax_tensor'}#打印prob,其值来源于softmax_tensor
train_hooks = hooks_helper.get_train_hooks(name_list=params.hooks,model_dir=params.model_dir,)#tensors_to_log=tensor_to_log)
os.makedirs(model.eval_dir())
train_hoooks_for_earlyStoping = stop_if_no_decrease_hook(model,eval_dir=model.eval_dir(),metric_name='accuracy',max_steps_without_decrease=1000,min_steps=100)
#必须使用loss而不是eval_loss,因为train里自动记录的是名字为‘loss’的值
#input_fn函数
def train_input_fn():#这里虽然返回的是一个ds但是实际上这个是被zip(feature,label)的ds,可以直接被parse成feature,label [也就是 model.train中需要input_fn返回的形式]
ds = dataset.train(params.data_dir)
ds = ds.cache().shuffle(buffer_size=50000).batch(params.batch_size)
ds = ds.repeat(params.epochs_between_evals)
return ds
def eval_input_fn():
return dataset.test(params.data_dir).batch(
params.batch_size).make_one_shot_iterator().get_next()
#每次返回一个(fea,lab)对??
#为啥eval的input返回的是迭代器而train的input返回的是整个的dataset??
#train和eval
for i in range(params.train_epochs // params.epochs_between_evals):
# tf.estimator.train_and_evaluate(model,train_spec=tf.estimator.TrainSpec(train_input_fn,hooks=[train_hoooks_for_earlyStoping]),
# eval_spec=tf.estimator.EvalSpec(eval_input_fn))
model.train(input_fn=train_input_fn,hooks=[train_hoooks_for_earlyStoping])# 如果这里参数传入了 hooks=train_hooks 那么model_fn中的train就要把注释的几个identity解开
if train_hoooks_for_earlyStoping.stopFlag == True :
break
eval_results = model.evaluate(input_fn=eval_input_fn)
print('\nEvaluation results:\n\t%s\n' % eval_results)
if model_helpers.past_stop_threshold(params.stop_threshold,
eval_results['accuracy']):
break
补充:
如何在estimator中实现Early_Stopping?借助hooks:https://blog.csdn.net/zongza/article/details/85017351
main函数:
def main(_):
#print(flags.FLAGS)
#print(type(flags.FLAGS))
run_mnist(flags.FLAGS)
if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.INFO)
define_flags()
absl_app.run(main)