keras核心已转储
This article is the follow up from Part 1. Here, I will compare the tf.data
and Keras.ImageDataGenerator
actual training times using the mobilenet
model.
本文是第1部分的后续文章。在这里,我将使用mobilenet
模型比较tf.data
和Keras.ImageDataGenerator
实际训练时间。
In Part 1, I showed that loading images using tf.data
is approximately 5 times faster in comparison toKeras.ImageDataGenerator
. The dataset considered was Kaggle- dogs_and_cats (217 MB) having 10000 images distributed among 2 different classes.
在第1部分中,我展示了使用tf.data
加载图像的速度比Keras.ImageDataGenerator
快5倍。 所考虑的数据集是Kaggledogs_and_cats (217 MB),具有10000张分布在2个不同类别中的 图像 。
In this Part 2, I have considered a bigger dataset which is commonly used for image classification problems. The dataset chosen is COCO2017 (18 GB) having 117266 images distributed among 80 different classes. Various versions of COCO datasets are freely available to try and test at this link. The reason for choosing the bigger dataset of 18 GB is to have better comparison results. For a practical image classification problem, datasets can be even bigger ranging from 100 GB (gigabytes)to a few TB (terabytes). In our case, 18 GB of data is enough to understand the comparison as using the dataset in TB would significantly increase the training times and computational resources.
在第2部分中,我考虑了一个较大的数据集,该数据集通常用于图像分类问题。 选择的数据集为COCO2017 (18 GB) ,其中117266张图像分布在80个不同的类别中。 可通过此链接免费获得各种版本的COCO数据集进行尝试和测试。 选择更大的18 GB数据集的原因是为了获得更好的比较结果。 对于实际的图像分类问题,数据集甚至可能更大,范围从100 GB(千兆字节)到几TB(TB)。 在我们的案例中,18 GB的数据足以理解比较,因为在TB中使用数据集会大大增加训练时间和计算资源。
培训时间结果(提前) (Training times Results (in advance))
The above results are compared on a workstation having 16-GB RAM, 2.80 GHz with Core i7 using GPU version of TensorFlow 2.x. The dataset considered is COCO2017 (18 GB) having 117266 images distributed among 80 different classes.
以上结果在具有TensorFlow 2.x GPU版本的具有Core i7的16 GB RAM,2.80 GHz的工作站上进行了比较。 所考虑的数据集为COCO2017 (18 GB) ,其中117266张图像分布在80个不同的类别中。
When using
Keras.ImageDataGenerator
, it took approximately 58 minutes during training for each epoch using COCO2017 dataset.使用
Keras.ImageDataGenerator
,使用COCO2017数据集训练每个历时大约需要58分钟 。When using
tf.data
with variablecache=True
, the program crashes. The reason behind this crash is that the dataset considered (size 18 GB) is bigger than the RAM of the workstation. Withcache=True
the program starts storing the images in the RAM for quick access and when this surpasses the RAM (16 GB in our case), program crashes. I have tested the same option for a smaller dataset Kaggle- dogs_and_cats and it worked fine. This shows that we shouldn’t usecache=True
option when the size of the considered dataset is bigger than the RAM.当将
tf.data
与变量cache=True
,程序崩溃。 发生此崩溃的原因是,所考虑的数据集( 大小为 18 GB)大于工作站的RAM 。 当cache=True
,程序开始将图像存储在RAM中以进行快速访问,当超过RAM( 在我们的情况下为16 GB)时,程序崩溃。 我已经对较小的数据集Kaggledogs_and_cats测试了相同的选项,并且效果很好。 这表明当所考虑的数据集的大小大于RAM时,我们不应使用cache=True
选项。When using
tf.data
with variablecache=False
, the program takes approximately 23 minutes, which is 2.5 times faster in comparison toKeras.ImageDataGenerator
.当使用带有变量
cache=False
tf.data
时,该程序大约需要23分钟,比Keras.ImageDataGenerator
快2.5倍 。When using
cache='some_path.tfcache'
, during the first epochtf.data
will make a dump of the dataset/images in your computer directory. This is why it is slower during the first epoch and takes around 42 minutes. In successive epochs, it doesn’t have to store the images again on the computer but use the already created dump during the first epoch, which ultimately speeds up the training times. During successive epochs, it took only 14 minutes for each epoch. Creating the dump is only 1-time process. For the hyperparameter tuning, it takes around 14 minutes in comparison to 58 minutes withKeras
, which is approximately 4.14 times faster.当使用
cache='some_path.tfcache'
,在第一个时期tf.data
将转储计算机目录中的数据集/图像。 这就是为什么它在第一个时期比较慢并且需要大约42分钟的原因 。 在连续的时期中,它不必将图像再次存储在计算机上,而是在第一个时期中使用已经创建的转储,从而最终加快了训练时间。 在连续的时期中, 每个时期仅花费14分钟 。 创建转储只是1次过程。 对于超参数调整,大约需要14分钟 ,而使用Keras
则需要58分钟 ,大约快了4.14倍。
Note: The dump created in the memory when using
cache='some_path.tfcache'
is approximately of size 90 GB which is actually far greater than the original size (18 GB) of the dataset. I couldn’t exactly understand the reason for this as there is no clear documentation from TensorFlow about this. I hope this is the glich which will be sorted out in the future.注意:使用
cache='some_path.tfcache'
时在内存中创建的转储大小约为90 GB,实际上远大于数据集的原始大小(18 GB)。 我无法完全理解其原因,因为TensorFlow没有明确的文档。 我希望这是将来可以解决的问题。For smaller datasets like Kaggle- dogs_and_cats which is only 217 MB, there will not be a noticeable difference in speed or training times with
tf.data
andKeras
as the RAM is big enough to store all the images at once.对于较小的数据集像Kaggle- dogs_and_cats这是只有217 MB,不会有与速度或训练次数明显的区别
tf.data
和Keras
的RAM足够大到所有图像保存一次。
训练模型代码 (Training model code)
Here, the initial code for creating and training the model is shown. Pretrained model mobilenet
is used to perform the transfer learning with base layers of pretrained model are frozen.
在这里,显示了用于创建和训练模型的初始代码。 预训练模型mobilenet
用于冻结已预训练模型基础层的转移学习。
def train_model(data_generator):
"""
Create and train model to perform Transfer learning using pretrained models.
Base layers of pretrained models are freezed.
Stack the classification layers on top of the pretrained model.
"""
img_shape = (img_dims[0], img_dims[1], 3)
base_model = tf.keras.applications.MobileNetV2(
input_shape=img_shape, include_top=False, weights='imagenet')
# Freeze the base layers of pretrained model
base_model.trainable = False
model = tf.keras.Sequential([base_model,
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(
256, activation='relu'),
tf.keras.layers.Dense(num_classes)])
# Define parameters for model compilation
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
since = time.time()
history = model.fit_generator(
data_generator['train'],
steps_per_epoch=num_images_train // batch_size,
epochs=epochs,
)
time_elapsed = time.time()-since
print(f'''\nTraining time: '''
f'''{datetime.timedelta(seconds=int(time_elapsed))}''')
所有的代码在一起 (All the Code together)
Here I have combined the code from both Part 1 and Part 2. The comments and docstrings for functions will help in understanding the code.
在这里,我结合了第1部分和第2部分中的代码。函数的注释和文档字符串将有助于理解代码。
import datetime
import os
import time
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
def load_data_using_keras(folders):
"""
Load the images in batches using Keras.
Shuffle images (for training set only) using keras.
Returns:
Data Generator to be used while training the model.
Note: Keras might need 'pillow' library to be installed. Use-
# pip install pillow
"""
image_generator = {}
data_generator = {}
for x in folders:
image_generator[x] = ImageDataGenerator(rescale=1./255)
shuffle_images = True if x == 'train' else False
data_generator[x] = image_generator[x].flow_from_directory(
batch_size=batch_size,
directory=os.path.join(dir_path, x),
shuffle=shuffle_images,
target_size=(img_dims[0], img_dims[1]),
class_mode='categorical')
return data_generator
def load_data_using_tfdata(folders):
"""
Load the images in batches using Tensorflow (tfdata).
Cache can be used to speed up the process.
Faster method in comparison to image loading using Keras.
Returns:
Data Generator to be used while training the model.
"""
def parse_image(file_path):
# convert the path to a list of path components
parts = tf.strings.split(file_path, os.path.sep)
class_names = np.array(os.listdir(dir_path + '/train'))
# The second to last is the class-directory
label = parts[-2] == class_names
# load the raw data from the file as a string
img = tf.io.read_file(file_path)
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_jpeg(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
img = tf.image.resize(img, [img_dims[0], img_dims[1]])
return img, label
def prepare_for_training(ds, cache=True, shuffle_buffer_size=1000):
# If a small dataset, only load it once, and keep it in memory.
# use `.cache(filename)` to cache preprocessing work for datasets
# that don't fit in memory.
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
# Repeat forever
ds = ds.repeat()
ds = ds.batch(batch_size)
# `prefetch` lets the dataset fetch batches in the background
# while the model is training.
ds = ds.prefetch(buffer_size=AUTOTUNE)
return ds
data_generator = {}
for x in folders:
dir_extend = dir_path + '/' + x
list_ds = tf.data.Dataset.list_files(str(dir_extend+'/*/*'))
AUTOTUNE = tf.data.experimental.AUTOTUNE
# Set `num_parallel_calls` so that multiple images are
# processed in parallel
labeled_ds = list_ds.map(
parse_image, num_parallel_calls=AUTOTUNE)
# cache = True, False, './file_name'
# If the dataset doesn't fit in memory use a cache file,
# eg. cache='./data.tfcache'
data_generator[x] = prepare_for_training(
labeled_ds, cache='cocodata.tfcache')
return data_generator
def timeit(ds, steps=1000):
"""
Check performance/speed for loading images using Keras or tfdata.
"""
start = time.time()
it = iter(ds)
for i in range(steps):
next(it)
print(' >> ', i, '/1000', end='\r')
duration = time.time()-start
print(f'''{steps} batches: '''
f'''{datetime.timedelta(seconds=int(duration))}''')
print(f'{round(batch_size*steps/duration)} Images/s')
def train_model(data_generator):
"""
Create and train model to perform Transfer learning using pretrained models.
Base layers of pretrained models are freezed.
Stack the classification layers on top of the pretrained model.
"""
img_shape = (img_dims[0], img_dims[1], 3)
base_model = tf.keras.applications.MobileNetV2(
input_shape=img_shape, include_top=False, weights='imagenet')
# Freeze the base layers of pretrained model
base_model.trainable = False
model = tf.keras.Sequential([base_model,
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(
256, activation='relu'),
tf.keras.layers.Dense(num_classes)])
# Define parameters for model compilation
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
since = time.time()
history = model.fit_generator(
data_generator['train'],
steps_per_epoch=num_images_train // batch_size,
epochs=epochs,
)
time_elapsed = time.time()-since
print(f'''\nTraining time: '''
f'''{datetime.timedelta(seconds=int(time_elapsed))}''')
if __name__ == '__main__':
# Need to change this w.r.t data
dir_path = '../../data/train2017_hard'
num_classes = 80
folders = ['train']
num_images_train = 117266
load_data_using = 'tfdata'
batch_size = 32
img_dims = [256, 256]
epochs = 5
learning_rate = 0.0001
if load_data_using == 'keras':
data_generator = load_data_using_keras(folders)
elif load_data_using == 'tfdata':
data_generator = load_data_using_tfdata(folders)
timeit(data_generator['train'])
train_model(data_generator)
结论 (Conclusion)
This article shows thattf.data
is 2.5(when using cache=False)
to 4.14 (when using cache=`some_path.tfcache`)
times faster in comparison to Keras.ImageDataGenerator
for any practical image classification problem. I think it is worth to try tf.data
.
本文表明, tf.data
为2.5 (when using cache=False)
至4.14 (when using cache= ` some_path.tfcache`)
相比倍的速度 Keras.ImageDataGenerator
为任何实用的图像分类问题。 我认为值得尝试tf.data
。
翻译自: https://towardsdatascience.com/dump-keras-imagedatagenerator-start-using-tensorflow-tf-data-part-2-fba7cda81203
keras核心已转储