Tensorflow2.0学习(11):Estimator

Estimator

  • Estimator是高级别的Tensorflow API。
  • 使用Estimator完成项目,需要以下几个步骤:
    • 创建一个或多个输入函数,以规整输入数据的格式,格式为:函数必须返回两个值,一个是特征名与特征张量组合而成的字典,一个是标签值或张量。建议使用Dataset API。
    • 定义特征列:特征列(feature columns)是一个对象,用于描述模型输入数据的格式,确定一个正确的接口。每个tf.feature_column确定一个功能名称,并指定其输入类型、大小等信息。使用tf.feature_column.numeric_column()。
    • 实例化Estimator:Tensorflow 提供了几个预创建的 Estimator 分类器,其中包括:
      • tf.estimator.DNNClassifier 用于多类别分类的深度模型。
      • tf.estimator.DNNLinearCombinedClassifier 用于广度与深度模型。
      • tf.estimator.LinearClassifier 用于基于线性模型的分类器。
    • 训练、评估、预测

Estimator实战

  • 导包
from __future__ import absolute_import, division, print_function, unicode_literals


import tensorflow as tf

import pandas as pd
  • 读取观察数据特征
# 设置特征名称和标签名称
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
# 加载数据集,并读取为Dataframe格式
train_path = tf.keras.utils.get_file(
    "iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
    "iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

# header: 将header这一行指定为列名,并且从这一行开始记录数据,默认为header=0
# names:指定列名,如果文件中不包含header的行,应该显性表示header=None
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
# 读取训练数据的前五行做观察
train.head()
SepalLength SepalWidth PetalLength PetalWidth Species
0 6.4 2.8 5.6 2.2 2
1 5.0 2.3 3.3 1.0 1
2 4.9 2.5 4.5 1.7 2
3 4.9 3.1 1.5 0.1 0
4 5.7 3.8 1.7 0.3 0
# 将训练数据和测试数据中的标签去掉
train_y = train.pop('Species')
test_y = test.pop('Species')

# 标签列现已从数据中删除
train.head()
SepalLength SepalWidth PetalLength PetalWidth
0 6.4 2.8 5.6 2.2
1 5.0 2.3 3.3 1.0
2 4.9 2.5 4.5 1.7
3 4.9 3.1 1.5 0.1
4 5.7 3.8 1.7 0.3
  • 格式化输入数据
# 定义函数,将数据存储为dataset格式,可以节省内存,并且方便并行读取
# 此步是为了给搭建好的模型投喂格式正确的输入数据
def input_fn(features, labels, training=True, batch_size=256):
    """An input function for training or evaluating"""
    # 将输入转换为数据集。
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # 如果在训练模式下混淆并重复数据。
    if training:
        dataset = dataset.shuffle(1000).repeat()
    
    return dataset.batch(batch_size)

  • 定义特征列
# 特征列描述了如何使用输入。
# 指定模型应该如何解读特定特征的一种函数
my_feature_columns = []
for key in train.keys():
    print(key)
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))
print(my_feature_columns)
SepalLength
SepalWidth
PetalLength
PetalWidth
[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]
  • 构建Estimator
# 构建一个拥有两个隐层,隐藏节点分别为 30 和 10 的深度神经网络
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # 隐层所含结点数量分别为 30 和 10.
    hidden_units=[30, 10],
    # 模型必须从三个类别中做出选择。
    n_classes=3)
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: C:\Users\Smile\AppData\Local\Temp\tmphgc259yl
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\Smile\\AppData\\Local\\Temp\\tmphgc259yl', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
  • 训练模型
# 训练模型
classifier.train(
    input_fn=lambda: input_fn(train, train_y, training=True),
    steps=5000)
WARNING:tensorflow:From E:\Anaconda\anaconda\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From E:\Anaconda\anaconda\lib\site-packages\tensorflow_core\python\training\training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Layer dnn is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

WARNING:tensorflow:From E:\Anaconda\anaconda\lib\site-packages\tensorflow_core\python\keras\optimizer_v2\adagrad.py:103: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\Smile\AppData\Local\Temp\tmphgc259yl\model.ckpt.
INFO:tensorflow:loss = 2.7596803, step = 0
INFO:tensorflow:global_step/sec: 417.249
INFO:tensorflow:loss = 1.5864722, step = 100 (0.240 sec)
INFO:tensorflow:global_step/sec: 682.968
INFO:tensorflow:loss = 1.3686752, step = 200 (0.159 sec)
INFO:tensorflow:global_step/sec: 691.149
INFO:tensorflow:loss = 1.2175257, step = 300 (0.132 sec)
INFO:tensorflow:global_step/sec: 695.377
INFO:tensorflow:loss = 1.1447594, step = 400 (0.144 sec)
INFO:tensorflow:global_step/sec: 694.894
INFO:tensorflow:loss = 1.1203682, step = 500 (0.144 sec)
INFO:tensorflow:global_step/sec: 649.194
INFO:tensorflow:loss = 1.073328, step = 600 (0.157 sec)
INFO:tensorflow:global_step/sec: 737.162
INFO:tensorflow:loss = 1.0393159, step = 700 (0.133 sec)
INFO:tensorflow:global_step/sec: 688.04
INFO:tensorflow:loss = 1.0136361, step = 800 (0.145 sec)
INFO:tensorflow:global_step/sec: 647.965
INFO:tensorflow:loss = 0.99417055, step = 900 (0.154 sec)
INFO:tensorflow:global_step/sec: 712.095
INFO:tensorflow:loss = 0.9741398, step = 1000 (0.140 sec)
INFO:tensorflow:global_step/sec: 702.392
INFO:tensorflow:loss = 0.94853187, step = 1100 (0.142 sec)
INFO:tensorflow:global_step/sec: 689.783
INFO:tensorflow:loss = 0.94446087, step = 1200 (0.145 sec)
INFO:tensorflow:global_step/sec: 644.138
INFO:tensorflow:loss = 0.9349041, step = 1300 (0.157 sec)
INFO:tensorflow:global_step/sec: 757.395
INFO:tensorflow:loss = 0.9281113, step = 1400 (0.146 sec)
INFO:tensorflow:global_step/sec: 677.107
INFO:tensorflow:loss = 0.9123041, step = 1500 (0.146 sec)
INFO:tensorflow:global_step/sec: 684.452
INFO:tensorflow:loss = 0.90185827, step = 1600 (0.132 sec)
INFO:tensorflow:global_step/sec: 688.624
INFO:tensorflow:loss = 0.89590585, step = 1700 (0.145 sec)
INFO:tensorflow:global_step/sec: 633.714
INFO:tensorflow:loss = 0.8893813, step = 1800 (0.160 sec)
INFO:tensorflow:global_step/sec: 736.433
INFO:tensorflow:loss = 0.8834985, step = 1900 (0.133 sec)
INFO:tensorflow:global_step/sec: 681.711
INFO:tensorflow:loss = 0.8767073, step = 2000 (0.159 sec)
INFO:tensorflow:global_step/sec: 671.442
INFO:tensorflow:loss = 0.86436415, step = 2100 (0.151 sec)
INFO:tensorflow:global_step/sec: 687.153
INFO:tensorflow:loss = 0.86384547, step = 2200 (0.146 sec)
INFO:tensorflow:global_step/sec: 671.386
INFO:tensorflow:loss = 0.859602, step = 2300 (0.146 sec)
INFO:tensorflow:global_step/sec: 694.817
INFO:tensorflow:loss = 0.843158, step = 2400 (0.146 sec)
INFO:tensorflow:global_step/sec: 689.812
INFO:tensorflow:loss = 0.8570354, step = 2500 (0.147 sec)
INFO:tensorflow:global_step/sec: 666.513
INFO:tensorflow:loss = 0.82521486, step = 2600 (0.147 sec)
INFO:tensorflow:global_step/sec: 634.746
INFO:tensorflow:loss = 0.82707626, step = 2700 (0.148 sec)
INFO:tensorflow:global_step/sec: 668.07
INFO:tensorflow:loss = 0.8173684, step = 2800 (0.149 sec)
INFO:tensorflow:global_step/sec: 736.048
INFO:tensorflow:loss = 0.8228967, step = 2900 (0.133 sec)
INFO:tensorflow:global_step/sec: 671.978
INFO:tensorflow:loss = 0.8122652, step = 3000 (0.161 sec)
INFO:tensorflow:global_step/sec: 676.142
INFO:tensorflow:loss = 0.8174802, step = 3100 (0.148 sec)
INFO:tensorflow:global_step/sec: 626.573
INFO:tensorflow:loss = 0.8014016, step = 3200 (0.149 sec)
INFO:tensorflow:global_step/sec: 671.279
INFO:tensorflow:loss = 0.8030711, step = 3300 (0.150 sec)
INFO:tensorflow:global_step/sec: 659.774
INFO:tensorflow:loss = 0.8061998, step = 3400 (0.151 sec)
INFO:tensorflow:global_step/sec: 659.655
INFO:tensorflow:loss = 0.782421, step = 3500 (0.152 sec)
INFO:tensorflow:global_step/sec: 661.854
INFO:tensorflow:loss = 0.79174757, step = 3600 (0.151 sec)
INFO:tensorflow:global_step/sec: 660.913
INFO:tensorflow:loss = 0.7846266, step = 3700 (0.153 sec)
INFO:tensorflow:global_step/sec: 723.83
INFO:tensorflow:loss = 0.78182733, step = 3800 (0.152 sec)
INFO:tensorflow:global_step/sec: 603.637
INFO:tensorflow:loss = 0.77771455, step = 3900 (0.147 sec)
INFO:tensorflow:global_step/sec: 636.475
INFO:tensorflow:loss = 0.7850309, step = 4000 (0.159 sec)
INFO:tensorflow:global_step/sec: 659.655
INFO:tensorflow:loss = 0.7761879, step = 4100 (0.153 sec)
INFO:tensorflow:global_step/sec: 710.711
INFO:tensorflow:loss = 0.7695364, step = 4200 (0.151 sec)
INFO:tensorflow:global_step/sec: 679.563
INFO:tensorflow:loss = 0.76040304, step = 4300 (0.150 sec)
INFO:tensorflow:global_step/sec: 610.901
INFO:tensorflow:loss = 0.75900936, step = 4400 (0.150 sec)
INFO:tensorflow:global_step/sec: 665.128
INFO:tensorflow:loss = 0.76111466, step = 4500 (0.151 sec)
INFO:tensorflow:global_step/sec: 667.254
INFO:tensorflow:loss = 0.76714766, step = 4600 (0.151 sec)
INFO:tensorflow:global_step/sec: 661.249
INFO:tensorflow:loss = 0.7604923, step = 4700 (0.147 sec)
INFO:tensorflow:global_step/sec: 626.097
INFO:tensorflow:loss = 0.74273324, step = 4800 (0.166 sec)
INFO:tensorflow:global_step/sec: 648.068
INFO:tensorflow:loss = 0.74195194, step = 4900 (0.148 sec)
INFO:tensorflow:Saving checkpoints for 5000 into C:\Users\Smile\AppData\Local\Temp\tmphgc259yl\model.ckpt.
INFO:tensorflow:Loss for final step: 0.7463323.






  • 测试模型
# 测试模型
eval_result = classifier.evaluate(
    input_fn=lambda: input_fn(test, test_y, training=False))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Layer dnn is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-02-27T11:20:14Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\Smile\AppData\Local\Temp\tmphgc259yl\model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.25205s
INFO:tensorflow:Finished evaluation at 2020-02-27-11:20:14
INFO:tensorflow:Saving dict for global step 5000: accuracy = 0.7, average_loss = 0.7753303, global_step = 5000, loss = 0.7753303
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5000: C:\Users\Smile\AppData\Local\Temp\tmphgc259yl\model.ckpt-5000

Test set accuracy: 0.700
  • 预测模型
# 由模型生成预测
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

def input_fn(features, batch_size=256):
    """An input function for prediction."""
    # 将输入转换为无标签数据集。
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

predictions = classifier.predict(
    input_fn=lambda: input_fn(predict_x))
for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print('Prediction is "{}" ({:.1f}%), expected "{}"'.format(
        SPECIES[class_id], 100 * probability, expec))
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\Smile\AppData\Local\Temp\tmphgc259yl\model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Prediction is "Setosa" (70.9%), expected "Setosa"
Prediction is "Versicolor" (40.8%), expected "Versicolor"
Prediction is "Versicolor" (39.3%), expected "Virginica"

你可能感兴趣的:(tensorflow,python,机器学习,深度学习,神经网络)