本教程是关于通过TensorFlow进行二元分类的逻辑回归训练。
在 使用TensorFlow 的线性回归中,我们描述了如何通过线性建模系统来预测连续值参数。如果目标是在两个选择之间做出决定怎么办?答案很简单:我们正在处理分类问题。在本教程中,使用Logistic回归确定输入图像是数字“0”还是数字“1”的目标。换句话说,它是否是数字“1”!完整的源代码可在关联的GitHub存储库中找到。
我们在本教程中处理的数据集是 MNIST数据集。主要数据集包括55000次训练和10000次测试图像。图像为28x28x1,每个图像代表0到9的手写数字。我们创建每个图像大小为784的特征向量。我们只为我们的设置使用0和1图像。
在线性回归中,努力是使用$ y = W ^ {T} x 的 线 性 函 数 来 预 测 结 果 连 续 值 。 另 一 方 面 , 在 逻 辑 回 归 中 , 我 们 决 定 将 二 进 制 标 签 预 测 为 的线性函数来预测结果连续值。另一方面,在逻辑回归中,我们决定将二进制标签预测为 的线性函数来预测结果连续值。另一方面,在逻辑回归中,我们决定将二进制标签预测为 y \ in \ {0,1 } $,其中我们使用不同的预测过程而不是线性回归。在逻辑回归中,预测输出是输入样本属于在我们的情况下为数字“1”的目标类的概率。在二元分类问题中,显然如果
$ P(x \ in \ {target \ _class })$ = M,那么$ P(x \ in \ {non \ _target \ _class })= 1 - M $ 。
因此,假设可以创建如下:
P ( y = 1 ∣ x ) = h W ( x ) = 1 1 + e x p ( − W T x ) = S i g m o i d ( W T x ) ( 1 ) P(y=1|x)=h_{W}(x)={{1}\over{1+exp(-W^{T}x)}}=Sigmoid(W^{T}x) \ \ \ (1) P(y=1∣x)=hW(x)=1+exp(−WTx)1=Sigmoid(WTx) (1) P ( y = 0 ∣ x ) = 1 − P ( y = 1 ∣ x ) = 1 − h W ( x ) ( 2 ) P(y=0|x)=1 - P(y=1|x) = 1 - h_{W}(x) \ \ \ (2) P(y=0∣x)=1−P(y=1∣x)=1−hW(x) (2)
在上面的等式中,Sigmoid函数将预测输出映射到概率空间,其中值在$ [0,1] $的范围内。主要目的是找到一个模型,使用该模型,当输入样本为“1”时,输出变为高概率,否则变小。重要的目标是设计适当的成本函数,以便在需要输出时将损失降至最低,反之亦然。
( x i , y i ) (x ^ {i},y ^ {i}) (xi,yi)等一组数据的成本函数可以定义如下:
L o s s ( W ) = ∑ i y ( i ) l o g 1 h W ( x i ) + ( 1 − y ( i ) ) l o g 1 1 − h W ( x i ) Loss(W) = \sum_{i}{y^{(i)}log{1\over{h_{W}(x^{i})}}+(1-y^{(i)})log{1\over{1-h_{W}(x^{i})}}} Loss(W)=i∑y(i)loghW(xi)1+(1−y(i))log1−hW(xi)1
从上面的等式可以看出,损失函数由两个项组成,并且在每个样本中,考虑到二进制标记,它们中只有一个是非零的。
到目前为止,我们已经定义了逻辑回归的公式和优化函数。在下一部分中,我们将展示如何使用小批量优化在代码中执行此操作。
首先,我们处理数据集并仅提取“0”和“1”数字。用于逻辑回归的代码很大程度上受到我们的 训练卷积神经网络作为分类器帖子的启发 。我们参考上述帖子以更好地理解实现细节。在本教程中,我们仅解释了我们如何处理数据集以及如何实现逻辑回归,其余内容从我们之前提到的CNN分类器帖子中清楚可见。
在这一部分中,我们将解释如何从数据集中提取所需样本并使用Softmax实现逻辑回归。
首先,我们需要从MNIST数据集中提取“0”和“1”数字:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", reshape=True, one_hot=False)
########################
### Data Processing ####
########################
# Organize the data and feed it to associated dictionaries.
data={}
data['train/image'] = mnist.train.images
data['train/label'] = mnist.train.labels
data['test/image'] = mnist.test.images
data['test/label'] = mnist.test.labels
# Get only the samples with zero and one label for training.
index_list_train = []
for sample_index in range(data['train/label'].shape[0]):
label = data['train/label'][sample_index]
if label == 1 or label == 0:
index_list_train.append(sample_index)
# Reform the train data structure.
data['train/image'] = mnist.train.images[index_list_train]
data['train/label'] = mnist.train.labels[index_list_train]
# Get only the samples with zero and one label for test set.
index_list_test = []
for sample_index in range(data['test/label'].shape[0]):
label = data['test/label'][sample_index]
if label == 1 or label == 0:
index_list_test.append(sample_index)
# Reform the test data structure.
data['test/image'] = mnist.test.images[index_list_test]
data['test/label'] = mnist.test.labels[index_list_test]
代码看起来很冗长但实际上非常简单。我们想要的只是在第28-32行中实现,其中提取了所需的数据样本。接下来,我们必须深入研究逻辑回归架构。
逻辑回归结构简单地通过完全连接的层来馈送 - 转发输入特征,其中最后一层仅具有两个类。完全连接的架构可以定义如下:
###############################################
########### Defining place holders ############
###############################################
image_place = tf.placeholder(tf.float32, shape=([None, num_features]), name='image')
label_place = tf.placeholder(tf.int32, shape=([None,]), name='gt')
label_one_hot = tf.one_hot(label_place, depth=FLAGS.num_classes, axis=-1)
dropout_param = tf.placeholder(tf.float32)
##################################################
########### Model + Loss + Accuracy ##############
##################################################
# A simple fully connected with two class and a Softmax is equivalent to Logistic Regression.
logits = tf.contrib.layers.fully_connected(inputs=image_place, num_outputs = FLAGS.num_classes, scope='fc')
前几行是定义占位符,以便在图表上放置所需的值。 有关详细信息,请参阅此帖子。使用TensorFlow可以使用以下脚本轻松实现所需的损失函数:
# Define loss
with tf.name_scope('loss'):
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=label_one_hot))
# Accuracy
with tf.name_scope('accuracy'):
# Evaluate the model
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(label_one_hot, 1))
# Accuracy calculation
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
tf.nn.softmax_cross_entropy_with_logits函数可以完成工作。它通过微妙的差异优化先前定义的成本函数。它产生两个输入,即使样本数字为“0”,相应的概率也会很高。所以tf.nn.softmax_cross_entropy_with_logits函数,为每个类预测一个概率,并固有地自己做出决定。
在本教程中,我们描述了逻辑回归并表示了如何在代码中实现它。我们不是根据基于目标类的输出概率做出决策,而是将问题扩展到两类问题,其中对于每个类我们预测概率。在以后的帖子中,我们将这个问题扩展到多类问题,我们表明它可以用类似的方法完成。
完整代码
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import tempfile
import urllib
import pandas as pd
import os
from tensorflow.examples.tutorials.mnist import input_data
######################################
######### Necessary Flags ############
######################################
tf.app.flags.DEFINE_string(
'train_path', os.path.dirname(os.path.abspath(__file__)) + '/train_logs',
'Directory where event logs are written to.')
tf.app.flags.DEFINE_string(
'checkpoint_path',
os.path.dirname(os.path.abspath(__file__)) + '/checkpoints',
'Directory where checkpoints are written to.')
tf.app.flags.DEFINE_integer('max_num_checkpoint', 10,
'Maximum number of checkpoints that TensorFlow will keep.')
tf.app.flags.DEFINE_integer('num_classes', 2,
'Number of model clones to deploy.')
tf.app.flags.DEFINE_integer('batch_size', int(np.power(2, 9)),
'Number of model clones to deploy.')
tf.app.flags.DEFINE_integer('num_epochs', 10,
'Number of epochs for training.')
##########################################
######## Learning rate flags #############
##########################################
tf.app.flags.DEFINE_float('initial_learning_rate', 0.001, 'Initial learning rate.')
tf.app.flags.DEFINE_float(
'learning_rate_decay_factor', 0.95, 'Learning rate decay factor.')
tf.app.flags.DEFINE_float(
'num_epochs_per_decay', 1, 'Number of epoch pass to decay learning rate.')
#########################################
########## status flags #################
#########################################
tf.app.flags.DEFINE_boolean('is_training', False,
'Training/Testing.')
tf.app.flags.DEFINE_boolean('fine_tuning', False,
'Fine tuning is desired or not?.')
tf.app.flags.DEFINE_boolean('online_test', True,
'Fine tuning is desired or not?.')
tf.app.flags.DEFINE_boolean('allow_soft_placement', True,
'Automatically put the variables on CPU if there is no GPU support.')
tf.app.flags.DEFINE_boolean('log_device_placement', False,
'Demonstrate which variables are on what device.')
# Store all elemnts in FLAG structure!
FLAGS = tf.app.flags.FLAGS
################################################
################# handling errors!##############
################################################
if not os.path.isabs(FLAGS.train_path):
raise ValueError('You must assign absolute path for --train_path')
if not os.path.isabs(FLAGS.checkpoint_path):
raise ValueError('You must assign absolute path for --checkpoint_path')
# Download and get MNIST dataset(available in tensorflow.contrib.learn.python.learn.datasets.mnist)
# It checks and download MNIST if it's not already downloaded then extract it.
# The 'reshape' is True by default to extract feature vectors but we set it to false to we get the original images.
mnist = input_data.read_data_sets("./MNIST_data/", reshape=True, one_hot=False)
########################
### Data Processing ####
########################
# Organize the data and feed it to associated dictionaries.
data={}
data['train/image'] = mnist.train.images
data['train/label'] = mnist.train.labels
data['test/image'] = mnist.test.images
data['test/label'] = mnist.test.labels
def extract_samples_Fn(data):
index_list = []
for sample_index in range(data.shape[0]):
label = data[sample_index]
if label == 1 or label == 0:
index_list.append(sample_index)
return index_list
# Get only the samples with zero and one label for training.
index_list_train = extract_samples_Fn(data['train/label'])
# Get only the samples with zero and one label for test set.
index_list_test = extract_samples_Fn(data['test/label'])
# Reform the train data structure.
data['train/image'] = mnist.train.images[index_list_train]
data['train/label'] = mnist.train.labels[index_list_train]
# Reform the test data structure.
data['test/image'] = mnist.test.images[index_list_test]
data['test/label'] = mnist.test.labels[index_list_test]
# Dimentionality of train
dimensionality_train = data['train/image'].shape
# Dimensions
num_train_samples = dimensionality_train[0]
num_features = dimensionality_train[1]
#######################################
########## Defining Graph ############
#######################################
graph = tf.Graph()
with graph.as_default():
###################################
########### Parameters ############
###################################
# global step
global_step = tf.Variable(0, name="global_step", trainable=False)
# learning rate policy
decay_steps = int(num_train_samples / FLAGS.batch_size *
FLAGS.num_epochs_per_decay)
learning_rate = tf.train.exponential_decay(FLAGS.initial_learning_rate,
global_step,
decay_steps,
FLAGS.learning_rate_decay_factor,
staircase=True,
name='exponential_decay_learning_rate')
###############################################
########### Defining place holders ############
###############################################
image_place = tf.placeholder(tf.float32, shape=([None, num_features]), name='image')
label_place = tf.placeholder(tf.int32, shape=([None,]), name='gt')
label_one_hot = tf.one_hot(label_place, depth=FLAGS.num_classes, axis=-1)
dropout_param = tf.placeholder(tf.float32)
##################################################
########### Model + Loss + Accuracy ##############
##################################################
# A simple fully connected with two class and a softmax is equivalent to Logistic Regression.
logits = tf.contrib.layers.fully_connected(inputs=image_place, num_outputs = FLAGS.num_classes, scope='fc')
# Define loss
with tf.name_scope('loss'):
loss_tensor = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=label_one_hot))
# Accuracy
# Evaluate the model
prediction_correct = tf.equal(tf.argmax(logits, 1), tf.argmax(label_one_hot, 1))
# Accuracy calculation
accuracy = tf.reduce_mean(tf.cast(prediction_correct, tf.float32))
#############################################
########### training operation ##############
#############################################
# Define optimizer by its default values
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
# 'train_op' is a operation that is run for gradient update on parameters.
# Each execution of 'train_op' is a training step.
# By passing 'global_step' to the optimizer, each time that the 'train_op' is run, Tensorflow
# update the 'global_step' and increment it by one!
# gradient update.
with tf.name_scope('train_op'):
gradients_and_variables = optimizer.compute_gradients(loss_tensor)
train_op = optimizer.apply_gradients(gradients_and_variables, global_step=global_step)
############################################
############ Run the Session ###############
############################################
session_conf = tf.ConfigProto(
allow_soft_placement=FLAGS.allow_soft_placement,
log_device_placement=FLAGS.log_device_placement)
sess = tf.Session(graph=graph, config=session_conf)
with sess.as_default():
# The saver op.
saver = tf.train.Saver()
# Initialize all variables
sess.run(tf.global_variables_initializer())
# The prefix for checkpoint files
checkpoint_prefix = 'model'
# If fie-tuning flag in 'True' the model will be restored.
if FLAGS.fine_tuning:
saver.restore(sess, os.path.join(FLAGS.checkpoint_path, checkpoint_prefix))
print("Model restored for fine-tuning...")
###################################################################
########## Run the training and loop over the batches #############
###################################################################
# go through the batches
test_accuracy = 0
for epoch in range(FLAGS.num_epochs):
total_batch_training = int(data['train/image'].shape[0] / FLAGS.batch_size)
# go through the batches
for batch_num in range(total_batch_training):
#################################################
########## Get the training batches #############
#################################################
start_idx = batch_num * FLAGS.batch_size
end_idx = (batch_num + 1) * FLAGS.batch_size
# Fit training using batch data
train_batch_data, train_batch_label = data['train/image'][start_idx:end_idx], data['train/label'][
start_idx:end_idx]
########################################
########## Run the session #############
########################################
# Run optimization op (backprop) and Calculate batch loss and accuracy
# When the tensor tensors['global_step'] is evaluated, it will be incremented by one.
batch_loss, _, training_step = sess.run(
[loss_tensor, train_op,
global_step],
feed_dict={image_place: train_batch_data,
label_place: train_batch_label,
dropout_param: 0.5})
########################################
########## Write summaries #############
########################################
#################################################
########## Plot the progressive bar #############
#################################################
print("Epoch " + str(epoch + 1) + ", Training Loss= " + \
"{:.5f}".format(batch_loss))
###########################################################
############ Saving the model checkpoint ##################
###########################################################
# # The model will be saved when the training is done.
# Create the path for saving the checkpoints.
if not os.path.exists(FLAGS.checkpoint_path):
os.makedirs(FLAGS.checkpoint_path)
# save the model
save_path = saver.save(sess, os.path.join(FLAGS.checkpoint_path, checkpoint_prefix))
print("Model saved in file: %s" % save_path)
############################################################################
########## Run the session for pur evaluation on the test data #############
############################################################################
# The prefix for checkpoint files
checkpoint_prefix = 'model'
# Restoring the saved weights.
saver.restore(sess, os.path.join(FLAGS.checkpoint_path, checkpoint_prefix))
print("Model restored...")
# Evaluation of the model
test_accuracy = 100 * sess.run(accuracy, feed_dict={
image_place: data['test/image'],
label_place: data['test/label'],
dropout_param: 1.})
print("Final Test Accuracy is %% %.2f" % test_accuracy)