本文主要用于学习记录,可能会存在些许错误,望读者谅解:
随着人工智能的不断发展,机器学习这门技术也越来越重要,很多人都开启了学习机器学习,本文就介绍了使用Tensorflow对五类医学图像进行分类模型的训练。
import matplotlib.pyplot as plt
import numpy as np
import PIL
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
本教程使用大约 5000 张医学照片的数据集。数据集包含五个子目录,每个类一个:
import pathlib
data_dir = r'D:\virtual_desk\others\5类医学图像'
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir.glob('*/*.jpeg')))
print(image_count)
PIL.Image.open(str(腹部CT[0]))
脑部CT = list(data_dir.glob('脑部CT/*'))
PIL.Image.open(str(脑部CT[0]))
batch_size = 32
img_height = 90
img_width = 90
train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
class_names = train_ds.class_names
print(class_names)
以下是训练数据集中的前九幅图像:
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
for image_batch, labels_batch in train_ds:
print(image_batch.shape)
print(labels_batch.shape)
break
结果:
image_batch是形状的张量(32, 90, 90, 3)。这是一批 32 张形状的图像90x90x3(最后一个维度是指颜色通道 RGB)。label_batch是 shape 的张量,(32,)这些是 32 幅图像的对应标签。
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
normalization_layer = layers.Rescaling(1./255)
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixel values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))
num_classes = len(class_names)
model = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
Sequential模型由三个卷积块 组成,每个块中tf.keras.layers.Conv2D都有一个最大池化层 ( tf.keras.layers.MaxPooling2D)。有一个全连接层 ( tf.keras.layers.Dense),其顶部有 128 个单元,由 ReLU 激活函数 ( ‘relu’) 激活.
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
epochs=10
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
结果:
从训练集和验证集可以看出,该模型在验证集和测试的准确率高达99.9%,不存在过拟合的情况,也不需要再使用数据增强或dropout等方法了。
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
腹部CT_path = r"D:\virtual_desk\others\5类医学图像\腹部CT\000000.jpeg"
img = tf.keras.utils.load_img(
腹部CT_path, target_size=(img_height, img_width)
)
img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(class_names[np.argmax(score)], 100 * np.max(score))
)
通过对图片分类可以学到
1,有效地从磁盘加载数据集。
2,检查和理解数据
3,构建输入管道
4,建立模型
5,训练模型
6,测试模型