卷积神经网络(Convolutional Neural Networks, CNN)是一类包含卷积计算且具有深度结构的前馈神经网络,是深度学习(deep learning)的代表算法之一 。卷积神经网络具有表征学习能力,能够按其阶层结构对输入信息进行平移不变分类,因此也被称为“平移不变人工神经网络。
是指为了得到一致假设而使假设变得过度严格。避免过拟合是分类器设计中的一个核心任务。通常采用增大数据量和测试样本集的方法对分类器性能进行评价。
给定一个假设空间H,一个假设h属于H,如果存在其他的假设h’属于H,使得在训练样例上h的错误率比h’小,但在整个实例分布上h’比h的错误率小,那么就说假设h过度拟合训练数据。
个假设在训练数据上能够获得比其他假设更好的拟合, 但是在训练数据外的数据集上却不能很好地拟合数据,此时认为这个假设出现了过拟合的现象。出现这种现象的主要原因是训练数据中存在噪音或者训练数据太少。
环境配置:
Anaconda
conda create -n tensorflow python=3.7
activate tensorflow
pip install tensorflow==1.14.0
pip install keras==2.2.5
TensorFlow与Keras版本对应.
import tensorflow as tf
tf.__version__
链接: https://pan.baidu.com/s/1JTyY259L58JfVLB98Iw7GQ
提取码:eaf4
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
from IPython.display import Image
import os
根目录路径
root_dir = os.getcwd()
存放图像数据集的目录
data_path = os.path.join(root_dir,'data')
import os,shutil
原始数据集的路径
original_dataset_dir = os.path.join(data_path,'train')
存储小数据集的目标
base_dir = os.path.join(data_path,'cats_and_dogs_small')
if not os.path.exists(base_dir):
os.mkdir(base_dir)
训练图像的目录
train_dir = os.path.join(base_dir,'train')
if not os.path.exists(train_dir):
os.mkdir(train_dir)
验证图像的目录
validation_dir = os.path.join(base_dir,'validation')
if not os.path.exists(validation_dir):
os.mkdir(validation_dir)
测试资料的目录
test_dir = os.path.join(base_dir,'test')
if not os.path.exists(test_dir):
os.mkdir(test_dir)
猫的图片的训练资料的目录
train_cats_dir = os.path.join(train_dir,'cats')
ifnot os.path.exists(train_cats_dir):
os.mkdir(train_cats_dir)
狗的图片的训练资料的目录
train_dogs_dir = os.path.join(train_dir,'dogs')
if not os.path.exists(train_dogs_dir):
os.mkdir(train_dogs_dir)
猫的图片的验证集目录
validation_cats_dir = os.path.join(validation_dir,'cats')
if not os.path.exists(validation_cats_dir):
os.mkdir(validation_cats_dir)
狗的图片的验证集目录
validation_dogs_dir = os.path.join(validation_dir,'dogs')
if not os.path.exists(validation_dogs_dir):
os.mkdir(validation_dogs_dir)
猫的图片的测试数据集目录
test_cats_dir = os.path.join(test_dir,'cats')
if not os.path.exists(test_cats_dir):
os.mkdir(test_cats_dir)
狗的图片的测试数据集目录
test_dogs_dir = os.path.join(test_dir,'dogs')
if not os.path.exists(test_dogs_dir):
os.mkdir(test_dogs_dir)
复制前600个猫的图片到train_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(600)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname)
dst = os.path.join(train_cats_dir,fname)
if not os.path.exists(dst):
shutil.copyfile(src,dst)
print("Copy next 600 cat images to train_cats_dir complete!")
Copy next 600 cat images to train_cats_dir complete!
复制后面400个猫的图片到validation_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1000,1400)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname)
dst = os.path.join(validation_cats_dir,fname)
if not os.path.exists(dst):
shutil.copyfile(src,dst)
print('Copy next 400 cat images to validation_cats_dir complete!')
Copy next 400 cat images to validation_cats_dir complete!
复制400张猫的图片到test_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1500,1900)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname)
dst = os.path.join(test_cats_dir,fname)
if not os.path.exists(dst):
shutil.copyfile(src,dst)
print("Copy next 400 cat images to test_cats_dir complete!")
Copy next 400 cat images to test_cats_dir complete!
复制前600张狗的图片到train_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(600)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname)
dst = os.path.join(train_dogs_dir,fname)
if not os.path.exists(dst):
shutil.copyfile(src,dst)
print("Copy first 600 dog images to train_dogs_dir complete!")
Copy first 600 dog images to train_dogs_dir complete!
复制后面400个狗的图片到validation_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1400)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(validation_dogs_dir, fname)
if not os.path.exists(dst):
shutil.copyfile(src, dst)
print('Copy next 400 dog images to validation_dogs_dir complete!')
Copy next 400 dog images to validation_dogs_dir complete!
复制400张狗的图片到test_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 1900)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(test_dogs_dir, fname)
if not os.path.exists(dst):
shutil.copyfile(src, dst)
print('Copy next 400 dog images to test_dogs_dir complete!')
Copy next 400 dog images to test_dogs_dir complete!
进行一次检查,计算每个分组中有多少张照片(训练/验证/测试)
print('total training cat images:', len(os.listdir(train_cats_dir)))
print('total training dog images:', len(os.listdir(train_dogs_dir)))
print('total validation cat images:', len(os.listdir(validation_cats_dir)))
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))
print('total test cat images:', len(os.listdir(test_cats_dir)))
print('total test dog images:', len(os.listdir(test_dogs_dir)))
total training cat images: 600
total training dog images: 600
total validation cat images: 400
total validation dog images: 400
total test cat images: 400
total test dog images: 400
有1200个训练图像,然后是800个验证图像,800个测试图像,其中每个分类都有相同数量的样本,是一个平衡的二元分类问题,意味着分类准确度将是合适的度量标准。
参考:
过拟合及数据增强
VGG猫狗数据集
TensorFlow和Keras猫狗数据集
VGG