uses tensorflow's eager execution to (1)build a model,(2)train the model on example data, and (3)use the model to make predictions on unknown data
there are many tensorflow APIs available, but we recommend starting with these high-level tensorflow concepts:
(1)enable an eager execution development environment
(2)import data with the Datasets API
(3)build models and layers with tensorflow's Keras API
this tutorial shows these APIs and is structured like many other tensorflow programs:
(1)import and parse the data sets
(2)select the type of model
(3)train the model
(4)evaluate the model's effectiveness
(5)use the trained model to make predictions
this tutorial uses eager execution, which is available in tensorflow 1.8
eager execution makes tensorflow evaluate operations immediately, returning concrete values instead of creating a computational graph that is executed later
once eager execution is enabled, it cannot be disabled within the same program
###
from __future__ import absolute_import, division, print_function
import os
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution() # must be called at program start up
print("TensorFlow version: {}".format(tf.VERSION))
print("Eager execution: {}".format(tf.executing_eagerly()))
a sophisticated machine learning program could classify flowers based on photographs. our ambitions are more modest--we're going to classify iris flowers based on the length and width measurements of their sepals and petals
the iris genus entails about 300 species, but our program will classify only the following three:
machine learning typically relies on numeric values. the label numbers are mapped to a named representation
0:iris setosa
1:iris virginica
2:iris versicolor
download the training dataset file using the tf.keras.utils.get_file function. this returns the file path of the downloaded file
###
train_dataset_fp = "/home/mao/dataset/Iris/iris_training.csv"
each line or row in the file is passed to the parse_csv function which grabs the first four feature fields and combines them into a single tensor. then, the last field is parsed as the label
###
def parse_csv(line):
example_defaults = [[0.], [0.], [0.], [0.], [0]]
parsed_line = tf.decode_csv(line, example_defaults)
# first 4 fields are features, combine into single tensor
features = tf.reshape(parsed_line[:-1], shape=(4,))
# last field is the label
label = tf.reshape(parsed_line[-1], shape=())
return features, label
tensorflow's Dataset API handles many common cases for feeding data into a model. this is a high-level API for reading data and transforming it into a form used for training
this program uses tf.data.TextLineDataset to load a CSV-formatted text file and is parsed with our parse_csv function. a tf.data.Dataset represents an input pipeline as a collection of elements and a series of transformations that act on those elements. transformation methods are chained together or called sequentially--just make sure to keep a reference to the returned Dataset object
training works best if the examples are in random order. use tf.data.Dataset.shuffle to randomize entries, setting buffer_size to a value larger than the number of examples
###
train_dataset = tf.data.TextLineDataset(train_dataset_fp)
train_dataset = train_dataset.skip(1)
train_dataset = train_dataset.map(parse_csv)
train_dataset = train_dataset.shuffle(buffer_size=1000)
train_dataset = train_dataset.batch(32)
# view a single example entry from a batch
features, label = iter(train_dataset).next()
print("example features: ", features[0])
print("example label: ", label[0])
a model is the relationship between features and the label
a good machine learning approach determines the model for you. if you feed enough representative examples into the right machine learning model type, the program will firgure out the relationships for you
there are several categories of neural networks and this program uses a dense, or fully-connected neural network: the neurons in one layer receive input connections from every neuron in the previous layer
the tf.keras.Sequential model is a linear stack of layers. its constructor takes a list of layer instances, in this case, two Dense layers with 10 nodes each, and an output layer with 3 nodes representing our label predictions. the first layer's input_shape parameter corresponds to the amount of features from the dataset, and is required
###
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation="relu", input_shape=(4,)),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(3)
])
the activation function determines the output of a single neuron to next layer. there are many available activations, but ReLU is common for hidden layers
the ideal number of hidden layers and neurons depends on the problem and dataset. like many aspect of machine learning, picking the best shape of the neural network requires a mixture of knowledge and experimentation. as a rule of thumb, increasing the number of hidden layers and neurons typically creates a more powerful model, which requires more data to train effectively
training is the stage of machine learning when the model is gradually optimized, or the model learns the dataset. the goal is to learn enough about the structure of the training dataset to make predictions about unseen data. if you learn too much about the training dataset, then the predictions only work for the data it has seen and will not be generalizable
in unsupervised machine learning, the examples don't contain labels. instead, the model typically finds patterns among the features
both training and evaluation stages need to calculate the model's loss. this measures how off a model's predictions are from the desired label, in other words, how bad the model is performing. we want to minimize, or optimize, this value
###
def loss(model, x, y):
y_ = model(x)
return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss(model, inputs, targets)
return tape.gradient(loss_value, model.variables)
the grad function uses the loss function and the tf.GradientTape to record operations that compute the gradients used to optimize our model
an optimizer applies the computed gradients to the model's variables to minimize the loss function. the gradients point in the direction of steepest ascent--so we'll travel the opposite way and move down the hill. by iteratively calculating the loss and gradient for each batch, we'll adjust the model during training. gradually, the model will find the best combination of weights and bias to minimize loss. and the lower the loss, the better the model's predictions
stanford class CS231n
the learning_rate sets the step size to take for each iteration down the hill. this is a hyperparameter that you'll commonly adjust to achieve better results
###
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
the following code block sets up these training steps:
(1)iterate each epoch. an epoch is one pass through the dataset
(2)within an epoch, iterate over each example in the training Dataset grabbing its features(x) and label(y)
(3)using the example's features, make a prediction and compare it with the label. measure the inaccuracy of the prediction and use that to calculate the model's loss and gradients
(4)use an optimizer to update the model's variables
(5)keep track of some stats for visulization
(6)repeat for each epoch
counter-intuitively, training a model longer does not guarantee a better model. num_epoches is a hyperparameter that you can tune
###
# keep results for plotting
train_loss_results = []
train_accuracy_results = []
num_epochs = 201
for epoch in range(num_epochs):
epoch_loss_avg = tfe.metrics.Mean()
epoch_accuracy = tfe.metrics.Accuracy()
# training loop - using batches of 32
for x, y in train_dataset:
# optimize the model
grads = grad(model, x, y)
optimizer.apply_gradients(zip(grads, model.variables),
global_step=tf.train.get_or_create_global_step())
# track progress
epoch_loss_avg(loss(model, x, y))
# compare predicted label to actual label
epoch_accuracy(tf.argmax(model(x), axis=1, output_type=tf.int32), y)
# end epoch
train_loss_results.append(epoch_loss_avg.result())
train_accuracy_results.append(epoch_accuracy.result())
if epoch % 50 == 0:
print("Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3f}".format(epoch,
epoch_loss_avg.result(),
epoch_accuracy.result()))
create basic charts using the matplotlib module
###
fig, axes = plt.subplots(2, sharex=True, figsize=(12, 8))
fig.suptitle('Training Metrics')
axes[0].set_ylabel("Loss", fontsize=14)
axes[0].plot(train_loss_results)
axes[1].set_ylabel("Accuracy", fontsize=14)
axes[1].set_xlabel("Epoch", fontsize=14)
axes[1].plot(train_accuracy_results)
plt.show()
evaluating means determining how effectively the model makes predictions
to fairly assess a model's effectiveness, the examples used to evaluate a model must be different from the examples used to train the model
###
test_dataset_fp = "/home/mao/dataset/Iris/iris_test.csv"
test_dataset = tf.data.TextLineDataset(test_dataset_fp)
test_dataset = test_dataset.skip(1)
test_dataset = test_dataset.map(parse_csv)
test_dataset = test_dataset.shuffle(1000)
test_dataset = test_dataset.batch(32)
unlike the training stage, the model only evaluates a single epoch of the test data
###
test_accuracy = tfe.metrics.Accuracy()
for (x, y) in test_dataset:
prediction = tf.argmax(model(x), axis=1, output_type=tf.int32)
test_accuracy(prediction, y)
print("Test set accuracy: {:.3%}".format(test_accuracy.result()))
recall, the label numbers are mapped to a named representation as:
0:iris setosa
1:iris versicolor
2:iris virginica
###
class_ids = ["Iris setosa", "Iris versicolor", "Iris virginica"]
predict_dataset = tf.convert_to_tensor([
[5.1, 3.3, 1.7, 0.5],
[5.9, 3.0, 4.2, 1.5],
[6.9, 3.1, 5.4, 2.1]
])
predictions = model(predict_dataset)
for i, logits in enumerate(predictions):
class_idx = tf.argmax(logits).numpy()
name = class_ids[class_idx]
print("Example {} prediction: {}".format(i, name))