机器学习分类猫狗
It’s been 3 weeks since I had started my 6 week ML project and I’m feeling very happy that I was able to do quite a lot of stuff and learn new things from this project. It is always better to do a project no matter whether it’s small or big, just try and see it happening so that we can learn how things work and can explore what’s going on.
自从我开始为期6周的ML项目以来已经有3周了,我为自己能够做很多事情并从这个项目中学到新知识感到非常高兴。 无论是大项目还是大型项目,总要尝试执行并观察它的发生,这样我们才能了解事情的发展方式并探索正在发生的事情。
2周概览 (2 Weeks in a Glance)
Week 1:
第一周:
- Images and labels. 图像和标签。
- Creating a validation set. 创建验证集。
- Pre-processing the images. 预处理图像。
- Turning data into batches. 将数据批量化。
(Previous article on Week 1: Link1, Link2)
(第1周的前一篇文章: Link 1, Link2 )
Week 2:
第二周:
- Building our model. 建立我们的模型。
- Creating the necessary callbacks. 创建必要的回调。
(Previous article on Week 2: Link
(第2周的前一篇文章: 链接
第三周 (Week 3)
After building the model and creating the call backs, it was time to train our model, not on the full data set, rather on a subset of our data. We train the model on a subset of our data because if any errors happen after training, we can easily note it by seeing the results from the subset, thus saving our valuable time. So it is better to train our model on a subset of our data, make the prediction, evaluation and then carry on training with the full data set.
在构建模型并创建回调之后,是时候训练我们的模型了,而不是在完整的数据集上,而是在我们的数据的子集上,进行训练。 我们在数据的子集上训练模型,因为如果训练后发生任何错误,我们可以通过查看子集的结果轻松地注意到它,从而节省了宝贵的时间。 因此,最好在部分数据上训练模型,进行预测,评估,然后对整个数据集进行训练。
Training our model
训练我们的模型
At first, we set the number of epochs to a certain number of our choice before starting our work. It can be easily set by initialising a variable Num_EPOCHS = 100.
首先,我们在开始工作之前将纪元数设置为我们选择的一定数目。 可以通过初始化变量Num_EPOCHS = 100.
轻松设置Num_EPOCHS = 100.
As we have done earlier, here also we make use of a function to train our model and this function do the following:
正如我们之前所做的,在这里我们还利用一个函数来训练我们的模型,并且该函数执行以下操作:
Create a model using
create_our_model()
(This function is designed in week 2).使用
create_our_model()
创建模型(此函数在第2周设计)。Setup a tensorboard callback using
create_callback()
(This function is designed in week 2).使用
create_callback()
设置一个张量板回调(此函数在第2周设计)。Call the
fit()
function.调用
fit()
函数。- Return the model. 返回模型。
We write 2 train functions as we need to do it for both cat and dog.
我们需要为猫和狗编写2个火车函数。
CAT
猫
# Lets build the model for our cat# Build a function to train and return a trained modeldef train_cat_model():
""" Trains a given model and returns the trained version. """
# Create a model
model = create_our_model()
# Create new TensorBoard session everytitme we train a model
tensorboard = create_callback()
# Fit the model to the data passing it the callbacks we created
model.fit(x=train_cat_data,
epochs=NUM_EPOCHS,
validation_data=val_cat_data,
validation_freq=1,
callbacks=[tensorboard, early_stopping])
# Return the fitted model
return model
DOG
狗
# Lets build the same for our dog# Build a function to train and return a trained modeldef train_dog_model():
""" Trains a given model and returns the trained version. """
# Create a model
model = create_our_model()
# Create new TensorBoard session everytitme we train a model
tensorboard = create_callback()
# Fit the model to the data passing it the callbacks we created
model.fit(x=train_dog_data,
epochs=NUM_EPOCHS,
validation_data=val_dog_data,
validation_freq=1,
callbacks=[tensorboard, early_stopping])
# Return the fitted model
return model
Now we have created the model for cat and dog. Lets fit our data into respective model and have a look at our result.
现在,我们为猫和狗创建了模型。 让我们将数据拟合到各个模型中,然后看看我们的结果。
CAT
猫
model_cat = train_cat_model()Epoch 1/100
1/25 [>.............................] - ETA: 0s - loss: 1.0319 - accuracy: 0.1875WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/summary_ops_v2.py:1277: stop (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use `tf.profiler.experimental.stop` instead.WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/summary_ops_v2.py:1277: stop (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use `tf.profiler.experimental.stop` instead.25/25 [==============================] - 412s 16s/step - loss: 0.2104 - accuracy: 0.9013 - val_loss: 0.0239 - val_accuracy: 0.9900
Epoch 2/100
25/25 [==============================] - 5s 210ms/step - loss: 0.0044 - accuracy: 1.0000 - val_loss: 0.0148 - val_accuracy: 0.9950
Epoch 3/100
25/25 [==============================] - 6s 231ms/step - loss: 0.0019 - accuracy: 1.0000 - val_loss: 0.0135 - val_accuracy: 0.9950
Epoch 4/100
25/25 [==============================] - 6s 237ms/step - loss: 0.0014 - accuracy: 1.0000 - val_loss: 0.0128 - val_accuracy: 0.9950
Epoch 5/100
25/25 [==============================] - 5s 209ms/step - loss: 0.0011 - accuracy: 1.0000 - val_loss: 0.0123 - val_accuracy: 0.9950
DOG
狗
model_dog = train_dog_model()Epoch 1/100
25/25 [==============================] - 526s 21s/step - loss: 0.1677 - accuracy: 0.9688 - val_loss: 0.0543 - val_accuracy: 0.9950
Epoch 2/100
25/25 [==============================] - 5s 214ms/step - loss: 0.0262 - accuracy: 0.9987 - val_loss: 0.0338 - val_accuracy: 0.9950
Epoch 3/100
25/25 [==============================] - 5s 214ms/step - loss: 0.0133 - accuracy: 1.0000 - val_loss: 0.0299 - val_accuracy: 0.9950
Epoch 4/100
25/25 [==============================] - 5s 211ms/step - loss: 0.0091 - accuracy: 1.0000 - val_loss: 0.0275 - val_accuracy: 0.9950
From the above results, we can observe the importance of early stopping callback because after repeating the value of accuracy for 3 consecutive epochs, the early stopping callback is triggered and training stops as there won’ t be any more improvement.
从以上结果中,我们可以观察到提前停止回调的重要性,因为在连续3个时间段重复精度值之后,将触发提前停止回调,并且训练将停止,因为不会有更多的改进。
Now our model has been trained. Let’s save and load the model for prediction and evaluation process.
现在我们的模型已经过训练。 让我们保存并加载模型以进行预测和评估过程。
Save and Load Our Model
保存并加载我们的模型
After training our model, we can just save and load the model whenever we want for future use. A function save_model() and load_model()
is used for saving and loading our model.
训练完模型后,我们可以随时保存和加载模型,以备将来使用。 函数save_model() and load_model()
用于保存和加载我们的模型。
# Create a function to save a modeldef save_model(model, suffix=None):
""" Saves a given model in a models directoory and appends a suffix(string). """
# Create a model directory pathname with current time
modeldir = os.path.join("drive/My Drive/CatVsDog/models",
datetime.datetime.now().strftime("%Y%m%d-%H%M%s"))
model_path = modeldir + "-" + suffix + ".h5" # Save format of model
print(f"Saving model to: {model_path}...")
model.save(model_path)
return model_path# Create a function to load the saved modeldef load_model(model_path):
""" Loads a saved model from a specified path. """
print(f"Loading saved model from {model_path}....")
model = tf.keras.models.load_model(model_path,
custom_objects={"KerasLayer":hub.KerasLayer})
return model
Making and Evaluating Predictions Using Trained Model
使用训练过的模型进行预测和评估
We can make the predictions on our model using .predict()
.
我们可以使用.predict()
对模型进行预测。
# Start with cat
predictions_cat = load_cat_model.predict(val_cat_data, verbose=1)
predictions_cat[:10]7/7 [==============================] - 76s 11s/steparray([[9.9873966e-01, 1.3146200e-05],
[9.9852967e-01, 7.7154495e-05],
[9.9644947e-01, 8.2693979e-05],
[9.9523073e-01, 3.1922915e-04],
[9.2441016e-01, 1.4963249e-02],
[9.8091561e-01, 5.0850637e-04],
[9.8195803e-01, 2.6257571e-06],
[9.9035978e-01, 2.1479432e-04],
[9.9588352e-01, 7.6251227e-04],
[9.9996185e-01, 3.8020313e-05]], dtype=float32)# Now do the same with dog
predictions_dog = load_dog_model.predict(val_dog_data, verbose=1)
predictions_dog[:10]7/7 [==============================] - 78s 11s/steparray([[1.2041514e-02, 9.6527416e-01],
[2.4055562e-06, 9.9221051e-01],
[4.7578651e-04, 9.9225640e-01],
[6.0469288e-06, 9.9614292e-01],
[2.9330796e-03, 9.8701084e-01],
[5.5391331e-05, 9.9003279e-01],
[3.0013523e-04, 9.9073374e-01],
[3.8970767e-03, 9.7210705e-01],
[5.4554670e-04, 9.7027451e-01],
[9.8549217e-06, 9.9355388e-01]], dtype=float32)
Now we can try to take an image from the data set and show the results of that image like predictions, labels etc. In order to plot results, we need to get a few things like:
现在,我们可以尝试从数据集中获取一张图像,并显示该图像的结果,例如预测,标签等。为了绘制结果,我们需要做一些事情,例如:
Get Prediction Label: This can be done by writing a function
get_pred_label_cat()
andget_pred_label_dog().
获取预测标签:可以通过编写函数
get_pred_label_cat()
和get_pred_label_dog().
# Catdef get_pred_label_cat(prediction_probabilities):
""" Turns an array of prediction probabilities into labels. """
return cat_label[np.argmax(prediction_probabilities)]# Dogdef get_pred_label_dog(prediction_probabilities):
""" Turns an array of prediction probabilities into labels. """
return dog_label[np.argmax(prediction_probabilities)]
Now we need to unbatchify our validation data in order to make the prediction and it is done by
unbatchify_cat()
andunbatchify_dog()
functions.现在我们需要取消验证数据的批处理以便进行预测,这是通过
unbatchify_cat()
和unbatchify_dog()
函数完成的。
# Create a function to unbatch a batch dataset# 1. Catdef unbatchify_cat(data):
""" Takes a batched dataset of (image, label) Tensors and returns separate arrays of images and labels. """
images = []
labels = []
# Loop through unbatched data
for image, label in data.unbatch().as_numpy_iterator():
images.append(image)
labels.append(cat_label[np.argmax(label)])
return images, labels# Create a function to unbatch a batch dataset# 2. Dogdef unbatchify_dog(data):
""" Takes a batched dataset of (image, label) Tensors and returns separate arrays of images and labels. """
images = []
labels = []
# Loop through unbatched data
for image, label in data.unbatch().as_numpy_iterator():
images.append(image)
labels.append(dog_label[np.argmax(label)])
return images, labels
Now we are ready to visualise everything as we have:
现在我们准备可视化所有内容:
- Prediction labels. 预测标签。
- Validation label. 验证标签。
- Validation images. 验证图像。
plot_pred_cat() and plot_pred_dog()
can be used which will:
可以使用plot_pred_cat() and plot_pred_dog()
:
- Takes an array of prediction probabilities, an array of truth labels and an array of images and integers. 接受一组预测概率,一组真相标签以及一组图像和整数。
- Convert the prediction probabilities to a predicted label. 将预测概率转换为预测标签。
- Plot the predicted label, its predicted probability, the truth label and the target image on a single plot. 在单个图上绘制预测标签,其预测概率,真相标签和目标图像。
def plot_pred_cat(prediction_probabilities, labels, images, n=1):
""" View the truth, prediction and images for n samples of cat """
pred_prob, true_label, image = prediction_probabilities[n], labels[n], images[n]
pred_label = get_pred_label_cat(pred_prob)
# Plot image and remove ticks
plt.imshow(image)
plt.xticks([])
plt.yticks([])
# Change the colour of the title depending on if the prediction is right or wrong
if (pred_label == true_label).any():
color="green"
else:
color = "red"
# Chnage plot title to be predicited, probability of prediction and truth label
plt.title("{} {:2.0f}% {}".format(pred_label,
np.max(pred_prob)*100,
"cat"), color=color)plot_pred_cat(prediction_probabilities = predictions_cat,
labels=val_cat_labels,
images=val_cat_images)
def plot_pred_dog(prediction_probabilities, labels, images, n=1):
""" View the truth, prediction and images for n samples of dog. """
pred_prob, true_label, image = prediction_probabilities[n], labels[n], images[n]
pred_label = get_pred_label_dog(pred_prob)
# Plot image and remove ticks
plt.imshow(image)
plt.xticks([])
plt.yticks([])
# Change the colour of the title depending on if the prediction is right or wrong
if (pred_label == true_label).any():
color="green"
else:
color = "red"
# Chnage plot title to be predicited, probability of prediction and truth label
plt.title("{} {:2.0f}% {}".format(pred_label,
np.max(pred_prob)*100,
"dog"), color=color)plot_pred_dog(prediction_probabilities = predictions_dog,
labels=val_dog_labels,
images=val_dog_images)
Now we have trained our model on the subset of data, saved and loaded the model, made the predictions and have done a visualisation with the image.
现在,我们已经在数据子集上训练了模型,保存并加载了模型,进行了预测,并对图像进行了可视化处理。
My Github Repo: Link
我的Github存储库: 链接
接下来是什么 (What’ Next)
The next thing that has been planned is to do the training on our full data set. Here we see that after training on the subset of data, we got correct predictions and we are good to move on with training our model with the full data set. So that’s the planning for the 4th week-to train our model on the full data set.
已计划的下一件事情是对我们完整的数据集进行培训。 在这里,我们看到在对数据子集进行训练之后,我们得到了正确的预测,并且很好地继续对带有完整数据集的模型进行训练。 这就是第4周的计划,以便在完整的数据集上训练我们的模型。
I am excited that I have reached so far and there are only a few more steps to reach the mountain’s summit. I am confident enough that I can finish the work in time and see my Cat Vs Dog project come to life real soon!
到目前为止,我很高兴,而且仅几步之遥即可到达山顶。 我有足够的信心可以及时完成工作,并看到我的Cat Vs Dog项目很快变成现实!
翻译自: https://medium.com/analytics-vidhya/what-about-a-6-week-machine-learning-project-4e328365f165
机器学习分类猫狗