数据集地址,这份数据集是猫的图片和狗的图片,是谷歌摘取了一部分Kaggle的猫狗分类图片集。这次我们并没有给出制作好的数据集,而是只有放在两个不同文件夹内的猫和狗的图片。
我们第一步先用PIL把图片格式统一一下,我们这里将像素改为(150*150)。
import os
from PIL import Image
base_dir = './dataset/cats_and_dogs_filtered'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')
train_cats_dir = os.path.join(train_dir, 'cats/')
train_dogs_dir = os.path.join(train_dir, 'dogs/')
validation_cats_dir = os.path.join(validation_dir, 'cats/')
validation_dogs_dir = os.path.join(validation_dir, 'dogs/')
train_cat_fnames = os.listdir(train_cats_dir)
train_dog_fnames = os.listdir(train_dogs_dir)
validation_cat_fnames = os.listdir(validation_cats_dir)
validation_dog_fnames = os.listdir(validation_dogs_dir)
os.mkdir(train_dir + '/resize_cats')
os.mkdir(train_dir + '/resize_dogs')
os.mkdir(validation_dir + '/resize_cats')
os.mkdir(validation_dir + '/resize_dogs')
for i in train_cat_fnames:
original_img = Image.open(train_cats_dir+i)
clipping_img = original_img.resize((150, 150), Image.ANTIALIAS)
clipping_img.save(train_dir+'/resize_cats/'+i)
for i in train_dog_fnames:
original_img = Image.open(train_dogs_dir + i)
clipping_img = original_img.resize((150, 150), Image.ANTIALIAS)
clipping_img.save(train_dir + '/resize_dogs/' + i)
for i in validation_cat_fnames:
original_img = Image.open(validation_cats_dir + i)
clipping_img = original_img.resize((150, 150), Image.ANTIALIAS)
clipping_img.save(validation_dir + '/resize_cats/' + i)
for i in validation_dog_fnames:
original_img = Image.open(validation_dogs_dir + i)
clipping_img = original_img.resize((150, 150), Image.ANTIALIAS)
clipping_img.save(validation_dir + '/resize_dogs/' + i)
下一步我们将猫狗图片制作成tfrecords格式的文件。
import os
import tensorflow as tf
from PIL import Image
cwd = './dataset/cats_and_dogs_filtered/train/'
classes = ('resize_cats', 'resize_dogs')
writer = tf.python_io.TFRecordWriter('cats_and_dogs_train_onehot.tfrecords')
for index, name in enumerate(classes):
class_path = cwd + name + '/'
for img_name in os.listdir(class_path):
img_path = class_path + img_name
img = Image.open(img_path)
img_raw = img.tobytes()
example = tf.train.Example(features=tf.train.Features(feature={
'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[1, 0]if index==0 else[0, 1])),
'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))
}))
writer.write(example.SerializeToString())
writer.close()
这里猫狗是个2分类问题,我们可以选择独热码作为标签,也可以选择用[0]或者[1]来代替猫和狗。
如果用[0,1] [1,0]这种方式来作为猫和狗标签的话,我们可以用softmax作为输出层的activation function这时我们需要用交叉熵来作为loss函数,这种好处是我们可以很轻易的计算准确率。
如果用[0] [1]来作为猫和狗标签的话,我们可以用sigmoid 作为输出层的 activation function 这时我们需要用logloss来作为loss函数,这种好处是我们可以很轻易的计算出AUC以及设置lambda阈值(分类阈值)。
我这里选择独热码的方式来作为标签,原因是我懒(设置阈值看AUC什么的太麻烦了)。
如果你不想用独热码方式,需要把上面的代码这部分:
'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[1, 0]if index==0 else[0, 1]))
# 改为
'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[index]))
这样猫就是0狗就是1了。
对于验证集也是这么制作cwd里的train改成validation就好了。
关于tfrecords的读取我们用之前官网给出的代码就好了,这段代码还蛮实用的,就算记不住也希望可以放在一个自己能找到的位置,方便随时调用。
目标:
我们这里先搭建一个简单的网络,只有1个卷积层,1个全连接层,1个分类输出层(softmax)。同样我们也需要保存训练结果,方便移植到其他网络。
可选练习(这里我不弄了,感兴趣不妨把这些内容加上):
动态lr:需要 设置global_step(不可训练变量), 需要用到
tf.train.exponential_decay()
tensorboard可视化:需要 用到
tf.summary.scalar()
tf.summary.histogram()
tf.summary.merge_all()
tf.summary.FileWriter()
等
神经网络中对图片的处理可视化:需要matplotlib
不同的train方法:(SGD,Adam等)
这里是训练用的神经网络。
import tensorflow as tf
def _parse_function(record):
"""Extracts features and labels.
Args:
record: File path to a TFRecord file
Returns:
A `tuple` `(labels, features)`:
features: A dict of tensors representing the features
labels: A tensor with the corresponding labels.
"""
features = {
"label": tf.FixedLenFeature([2], tf.int64), # terms are strings of varying lengths
"img_raw": tf.FixedLenFeature([], tf.string) # labels are 0 or 1
}
parsed_features = tf.parse_single_example(record, features)
img_raw = parsed_features['img_raw']
img_raw = tf.decode_raw(img_raw, tf.uint8)
img_raw = tf.reshape(img_raw, [150, 150, 3])
labels = parsed_features['label']
return img_raw, labels
def my_input_fn(input_filenames, num_epochs=None, shuffle=True):
# Same code as above; create a dataset and map features and labels.
ds = tf.data.TFRecordDataset(input_filenames)
ds = ds.map(_parse_function)
if shuffle:
ds = ds.shuffle(10000)
# Our feature data is variable-length, so we pad and batch
# each field of the dataset structure to whatever size is necessary.
ds = ds.padded_batch(25, ds.output_shapes)
ds = ds.repeat(num_epochs)
# Return the next batch of data.
features, labels = ds.make_one_shot_iterator().get_next()
return features, labels
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=.1)
return tf.Variable(initial)
def biases_variable(shape):
initial = tf.constant(.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, w):
return tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
def _loss(ys, pred):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(tf.clip_by_value(pred, 1e-10, 1.0)), reduction_indices=[1]))
return cross_entropy
def train_step(learning_rate, loss):
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)
return optimizer
def accuracy(pred, ys):
_bool = tf.equal(tf.argmax(pred, 1), tf.argmax(ys, 1))
acc = tf.reduce_mean(tf.cast(_bool, tf.float32))
return acc
train_path = my_input_fn('cats_and_dogs_train_onehot.tfrecords')
xs = train_path[0]
xs = tf.cast(xs, tf.float32)
x_input = xs/255
ys = train_path[1]
y_input = tf.cast(ys, tf.float32)
w_conv1 = weight_variable([3, 3, 3, 6])
b_conv1 = biases_variable([6])
h_conv1 = tf.nn.relu(conv2d(x_input, w_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
h_pool1_flat = tf.reshape(h_pool1, [-1, 75*75*6])
w_fc1 = weight_variable([75*75*6, 10])
b_fc1 = biases_variable([10])
h_fc1 = tf.nn.relu(tf.matmul(h_pool1_flat, w_fc1) + b_fc1)
w_fc2 = weight_variable([10, 2])
b_fc2 = biases_variable([2])
pred = tf.nn.softmax(tf.matmul(h_fc1, w_fc2) + b_fc2)
start_learning_rate = .005
loss = _loss(y_input, pred)
train = train_step(start_learning_rate, loss)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
saver = tf.train.Saver()
for i in range(1000):
sess.run(train)
save_path = saver.save(sess, 'my_net/simple_cnn1.ckpt')
if i % 50 == 0:
acc = accuracy(pred, y_input)
print('accuracy:', sess.run(acc))
整个网络训练好后,我们来新建一个网络来看看效果如何。
import tensorflow as tf
import numpy as np
def _parse_function(record):
"""Extracts features and labels.
Args:
record: File path to a TFRecord file
Returns:
A `tuple` `(labels, features)`:
features: A dict of tensors representing the features
labels: A tensor with the corresponding labels.
"""
features = {
"label": tf.FixedLenFeature([2], tf.int64), # terms are strings of varying lengths
"img_raw": tf.FixedLenFeature([], tf.string) # labels are 0 or 1
}
parsed_features = tf.parse_single_example(record, features)
img_raw = parsed_features['img_raw']
img_raw = tf.decode_raw(img_raw, tf.uint8)
img_raw = tf.reshape(img_raw, [150, 150, 3])
labels = parsed_features['label']
return img_raw, labels
def my_input_fn(input_filenames, num_epochs=None, shuffle=False):
# Same code as above; create a dataset and map features and labels.
ds = tf.data.TFRecordDataset(input_filenames)
ds = ds.map(_parse_function)
if shuffle:
ds = ds.shuffle(10000)
# Our feature data is variable-length, so we pad and batch
# each field of the dataset structure to whatever size is necessary.
ds = ds.padded_batch(25, ds.output_shapes)
ds = ds.repeat(num_epochs)
# Return the next batch of data.
features, labels = ds.make_one_shot_iterator().get_next()
return features, labels
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=.1)
return tf.Variable(initial)
def biases_variable(shape):
initial = tf.constant(.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, w):
return tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
def accuracy(pred, ys):
_bool = tf.equal(tf.argmax(pred, 1), tf.argmax(ys, 1))
acc = tf.reduce_mean(tf.cast(_bool, tf.float32))
return acc
train_path = my_input_fn('cats_and_dogs_train_onehot.tfrecords')
xs = train_path[0]
xs = tf.cast(xs, tf.float32)
x_input = xs/255
ys = train_path[1]
y_input = tf.cast(ys, tf.float32)
w_conv1 = weight_variable([3, 3, 3, 6])
b_conv1 = biases_variable([6])
h_conv1 = tf.nn.relu(conv2d(x_input, w_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
h_pool1_flat = tf.reshape(h_pool1, [-1, 75*75*6])
w_fc1 = weight_variable([75*75*6, 10])
b_fc1 = biases_variable([10])
h_fc1 = tf.nn.relu(tf.matmul(h_pool1_flat, w_fc1) + b_fc1)
w_fc2 = weight_variable([10, 2])
b_fc2 = biases_variable([2])
pred = tf.nn.softmax(tf.matmul(h_fc1, w_fc2) + b_fc2)
sess = tf.Session()
saver = tf.train.Saver()
saver.restore(sess, 'my_net/simple_cnn1.ckpt')
lst = []
for i in range(80):
acc = accuracy(pred, y_input)
lst.append(sess.run(acc))
print(lst)
print(np.mean(lst))
准确率相当不错, 我这里得到的是99.3%。
让我们改一下my_input_fn当中的地址,换成验证集的。
结果准确率降到了63%。
可见我们的CNN网络中存在严重的过拟合,泛化能力极差。
如何解决这个问题呢? 我们下一篇文章开始讲解cnn过拟合和泛化的问题。