前面一篇Keras 基础学习的文章说的是多分类问题,稍微提及了如果一个样本属于多个类,比如在CelebA数据集中一张人脸照片是椭圆形的,戴眼镜的… 这个时候样本的label标注是multi-hot-encoding的,也就是会出现多个标签1。如[0,1,1,0], 1代表属性出现,0代表属性不出现。当然其实这种问题也可以分解为多个二分类/多分类问题,比如人脸形状训练一个模型,有无戴眼镜训练一个模型,头发颜色训练一个模型,然后使用几个模型一起预测。缺点的话就是成本太高,一个模型可以进行多标签分类就容易多了。这里面最典型的问题应该是就CelebA数据集衍生出的一系列模型,电影类型分类等。
先看下数据
整个数据集有两种属性,一种是颜色,一种是衣服的类型。说明我们对每个衣服的label应该是长度为6的vector,其中两个值为1,其他为0。如假设one-hot-vector编码顺序为是(blue, red, black, dress, jeans, shirt)则black jeans的 label就是[0,0,1,0,1,0]。
实际上Adrian Rosebrock使用的是传统的CNN结构,最后几层是全连接层,参数很多。整个模型训练下来有100M。但是实际上现在轻量网络都是将全连接层改为pooling,减少参数。下面给出一个我改进的网络结构,增加两层卷积层和GlobalAveraePooling2D层,训练完只有14M,accuray和原来的网络差不多,甚至更好:
确定了网络结构之后就需要确定针对我们的问题选用什么样的loss function了,这也是多类别分类和多标签的差别之处了,先看看多分类问题中的softmax函数,假设Z是网络最后的输出:
p ( c i ∣ x i ) = exp ( z j ) ∑ k = 1 n exp ( z k ) p\left(c_{i} | x_{i}\right)=\frac{\exp \left(z_{j}\right)}{\sum_{k=1}^{n} \exp \left(z_{k}\right)} p(ci∣xi)=∑k=1nexp(zk)exp(zj)
import math
def softmax(z):
z_exp = [math.exp(i) for i in z]
sum_z_exp = sum(z_exp)
return [i/sum_z_exp for i in z_exp]
经过softmax层之后得到的是一个多项式概率分布,所有的节点概率和为1,这种情况下,每个类别的输出是不独立的。
In [4]: z = [-1.0, 5.0, -0.5, 4.7, -0.5]
In [5]: softmax(z)
Out[5]:
[0.0014152405960574873,
0.5709488061694115,
0.002333337273878307,
0.4229692786867745,
0.002333337273878307]
而我们的预测值为概率只为0.57的第二类
而sigmoid层为:
σ ( c j ∣ x i ) = 1 1 + exp ( − z j ) \sigma\left(c_{j} | x_{i}\right)=\frac{1}{1+\exp \left(-z_{j}\right)} σ(cj∣xi)=1+exp(−zj)1
import math
def sigmoid(z):
return [1/(1+math.exp(-n)) for n in z]
In [7]: z = [-1.0, 5.0, -0.5, 4.7, -0.5]
In [8]: sigmoid(z)
Out[8]:
[0.2689414213699951,
0.9933071490757153,
0.3775406687981454,
0.990986701347152,
0.3775406687981454]
在sigmoid函数下,假设我们的判断阈值为0.5,则我们的预测值应该是第2,4类。在这种情况下每个输出节点对网络预测都是独立的。这也是我们在多标签分类要解决的问题,我们希望网络的输出是独立的伯努利分布,每个节点对loss函数的贡献都是独立的,每个label的出现也是独立的,最后的预测值主要取决于你的训练样本,如果你的训练样本没有红色和蓝色同时出现的,那么你的预测值很大程度上不会有。
# import the necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras import backend as K
'''
The optional argument, finalAct (with a default value of "softmax" ) will be utilized at the end of the network architecture.
Changing this value from softmax to sigmoid will enable us to perform multi-label classification with Keras.
'''
class SmallerVGGNet:
@staticmethod
def build(width, height, depth, classes, finalAct="softmax"):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# CONV => RELU => POOL
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation(finalAct))
# plot the model architecture in shell
model.summary()
# return the constructed network architecture
return model
The optional argument, finalAct (with a default value of “softmax” ) will be utilized at the end of the network architecture. Changing this value from softmax to sigmoid will enable us to perform multi-label classification with Keras.
class SimpleNet:
@staticmethod
def build(width, height, depth, classes, finalAct="softmax"):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# CONV => RELU => POOL
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(256, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(256, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# use global average pooling instead of fc layer
model.add(GlobalAveragePooling2D())
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation(finalAct))
# plot the model architecture in shell
model.summary()
# return the constructed network architecture
return model
# USAGE
# python train.py --dataset dataset --model fashion.model --labelbin mlb.pickle
import matplotlib
matplotlib.use("Agg")
#导入必要的包
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from keras.preprocessing.image import img_to_array
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from imutils import paths
import numpy as np
import argparse
import random
import pickle
import cv2
import os
/work/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:2: UserWarning:
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.
The backend was *originally* set to 'module://ipykernel.pylab.backend_inline' by the following code:
File "/work/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/work/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/work/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in
app.launch_new_instance()
File "/work/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/work/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 478, in start
self.io_loop.start()
File "/work/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/work/anaconda3/lib/python3.6/asyncio/base_events.py", line 421, in run_forever
self._run_once()
File "/work/anaconda3/lib/python3.6/asyncio/base_events.py", line 1431, in _run_once
handle._run()
File "/work/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/work/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
handler_func(fileobj, events)
File "/work/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/work/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
self._handle_recv()
File "/work/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
self._run_callback(callback, msg)
File "/work/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
callback(*args, **kwargs)
File "/work/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/work/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/work/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/work/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/work/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/work/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/work/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2739, in run_cell
self.events.trigger('post_run_cell')
File "/work/anaconda3/lib/python3.6/site-packages/IPython/core/events.py", line 73, in trigger
func(*args, **kwargs)
File "/work/anaconda3/lib/python3.6/site-packages/ipykernel/pylab/backend_inline.py", line 160, in configure_once
activate_matplotlib(backend)
File "/work/anaconda3/lib/python3.6/site-packages/IPython/core/pylabtools.py", line 308, in activate_matplotlib
matplotlib.pyplot.switch_backend(backend)
File "/work/anaconda3/lib/python3.6/site-packages/matplotlib/pyplot.py", line 232, in switch_backend
matplotlib.use(newbackend, warn=False, force=True)
File "/work/anaconda3/lib/python3.6/site-packages/matplotlib/__init__.py", line 1305, in use
reload(sys.modules['matplotlib.backends'])
File "/work/anaconda3/lib/python3.6/importlib/__init__.py", line 166, in reload
_bootstrap._exec(spec, module)
File "/work/anaconda3/lib/python3.6/site-packages/matplotlib/backends/__init__.py", line 14, in
line for line in traceback.format_stack()
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", default='./dataset',
help="path to input dataset (i.e., directory of images)")
ap.add_argument("-m", "--model", default='./models/fashion.h5',
help="path to output model")
ap.add_argument("-l", "--labelbin", default='mlb.pickle',
help="path to output label binarizer")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
help="path to output accuracy/loss plot")
args = vars(ap.parse_args())
usage: ipykernel_launcher.py [-h] [-d DATASET] [-m MODEL] [-l LABELBIN]
[-p PLOT]
ipykernel_launcher.py: error: unrecognized arguments: -f /run/user/1017/jupyter/kernel-ed98477d-eb1d-4c80-b850-e1cddbdcb1ac.json
An exception has occurred, use %tb to see the full traceback.
SystemExit: 2
/work/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2918: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
#initialize the number of epochs to train for ,initial learning rate,batch size and image dimensions
EPOCHS = 75
INIT_LR = 1e-3
BS = 32
IMAGE_DIMS = (96, 96, 3)
#grab the image paths and randomly shuffle them
print("[INFO] loading images...")
imagePaths = sorted(list(paths.list_images(args["dataset"])))
random.seed(42)
random.shuffle(imagePaths)
#initialize the data and labels
data = []
labels = []
#loop over the input images
for imagePath in imagePaths:
#load the image,pre-process it and store it in the data list
image = cv2.imread(imagePath)
image = cv2.resize(image,(IMAGE_DIMS[1],IMAGE_DIMS[0]))
image = img_to_array(image)
data.append(image)
#extract set of class labels from the image path and update the labels list
l = label = imagePath.split(os.path.sep)[-2].split("_")
labels.append(1)
#scale the raw pixel intensities to the range [0,1]
data = np.array(data,dtype="float")/255.0
labels = np.array(labels)
print("[INFO] data matrix: {} images ({:.2f}MB)".format(len(imagePaths), data.nbytes / (1024 * 1000.0)))
print(labels)
#标签的二值化
print("[INFO] class labels:")
mlb = MultiLabelBinarizer()
labels = mlb.fit_transform(labels)
print(labels)
#loop over each of the possible class labels and show them
for (i,label) in enumerate(mlb.classes_):
print("{}. {}".format(i + 1, label))
#partition the data into training and testing splits using 80% of the data for training and remaining 20% for testing
(trainX,testX,trainY,testY) = train_test_split(data,labels,test_size=0.2,random_state=42)
#construct the image generate for data aug
aug = ImageDataGenerator(rotation_range=25, width_shift_range=0.1,height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,horizontal_flip=True, fill_mode="nearest")
#initialize the model using a sigmoid activation as the final layer in the network so we can perform multi-label classification
print("[INFO] compiling model...")
# model = SmallerVGGNet.build(width=IMAGE_DIMS[1],height=IMAGE_DIMS[0],depth=IMAGE_DIMS[2],classes=len(mlb.classes_),finalAct="sigmoid")
model = SimpleNet.build(
width=IMAGE_DIMS[1], height=IMAGE_DIMS[0],
depth=IMAGE_DIMS[2], classes=len(mlb.classes_),
finalAct="sigmoid")
#intialize the optimizer (SGD is sufficient)
opt = Adam(lr=INIT_LR,decay=INIT_LR/EPOCHS)
# compile the model using binary cross-entropy rather than
# categorical cross-entropy -- this may seem counterintuitive for
# multi-label classification, but keep in mind that the goal here
# is to treat each output label as an independent Bernoulli
# distribution
model.compile(loss="binary_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the network
print("[INFO] training network...")
H = model.fit_generator(
aug.flow(trainX, trainY, batch_size=BS),
validation_data=(testX, testY),
steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS, verbose=1)
# save the model to disk
print("[INFO] serializing network...")
model.save(args["model"])
# save the multi-label binarizer to disk
print("[INFO] serializing label binarizer...")
f = open(args["labelbin"], "wb")
f.write(pickle.dumps(mlb))
f.close()
# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
N = EPOCHS
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="upper left")
plt.savefig(args["plot"])
# USAGE
# python classify.py --model fashion.model --labelbin mlb.pickle --image examples/example_01.jpg
# import the necessary packages
from keras.preprocessing.image import img_to_array
from keras.models import load_model
import numpy as np
import argparse
import imutils
import pickle
import cv2
import os
#construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", default='./models/fashion.h5',
help="path to trained model model")
ap.add_argument("-l", "--labelbin", default='./mlb.pickle',
help="path to label binarizer")
ap.add_argument("-i", "--image", required=True,
help="path to input image")
args = vars(ap.parse_args())
#load the image
image = cv2.imread(args["image"])
output = imutils.resize(image,width=400)
#pre-process the image for classification
image = cv2.resize(image,(96,96))
image = image.astype("float")/255.0
image = img_to_array(image)
image = np.expand_dims(image,axis=0)
#load the trained CNN and the multi-label binarizer
print("[INFO] loading network...")
model = load_model(args["model"])
mlb = pickle.loads(open(args["labelbin"],"rb").read())
#classify the input image then find the indexes of the two class labels with the "largest" prob.
print("[INFO] classifying image...")
proba = model.predict(image)[0]
idxs = np.argsort(proba)[::-1][:2]
#loop over the indexes of the high confidence class labels
for (i, j) in enumerate(idxs):
# build the label and draw the label on the image
label = "{}: {:.2f}%".format(mlb.classes_[j], proba[j] * 100)
cv2.putText(output, label, (10, (i * 30) + 25),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
# show the probabilities for each of the individual labels
for (label, p) in zip(mlb.classes_, proba):
print("{}: {:.2f}%".format(label, p * 100))
# show the output image
cv2.imshow("Output", output)
cv2.waitKey(0)
# USAGE
# python search_bing_api.py --query "blue jeans" --output dataset/blue_jeans
# python search_bing_api.py --query "blue dress" --output dataset/blue_dress
# python search_bing_api.py --query "red dress" --output dataset/red_dress
# python search_bing_api.py --query "red shirt" --output dataset/red_shirt
# python search_bing_api.py --query "blue shirt" --output dataset/blue_shirt
# python search_bing_api.py --query "black jeans" --output dataset/black_jeans
# import the necessary packages
from requests import exceptions
import argparse
import requests
import cv2
import os
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-q", "--query", required=True,
help="search query to search Bing Image API for")
ap.add_argument("-o", "--output", required=True,
help="path to output directory of images")
args = vars(ap.parse_args())
# set your Microsoft Cognitive Services API key along with (1) the
# maximum number of results for a given search and (2) the group size
# for results (maximum of 50 per request)
API_KEY = "INSERT_YOUR_API_KEY_HERE"
MAX_RESULTS = 400
GROUP_SIZE = 50
# set the endpoint API URL
URL = "https://api.cognitive.microsoft.com/bing/v7.0/images/search"
# when attemping to download images from the web both the Python
# programming language and the requests library have a number of
# exceptions that can be thrown so let's build a list of them now
# so we can filter on them
EXCEPTIONS = set([IOError, FileNotFoundError,
exceptions.RequestException, exceptions.HTTPError,
exceptions.ConnectionError, exceptions.Timeout])
# store the search term in a convenience variable then set the
# headers and search parameters
term = args["query"]
headers = {"Ocp-Apim-Subscription-Key" : API_KEY}
params = {"q": term, "offset": 0, "count": GROUP_SIZE}
# make the search
print("[INFO] searching Bing API for '{}'".format(term))
search = requests.get(URL, headers=headers, params=params)
search.raise_for_status()
# grab the results from the search, including the total number of
# estimated results returned by the Bing API
results = search.json()
estNumResults = min(results["totalEstimatedMatches"], MAX_RESULTS)
print("[INFO] {} total results for '{}'".format(estNumResults,
term))
# initialize the total number of images downloaded thus far
total = 0
# loop over the estimated number of results in `GROUP_SIZE` groups
for offset in range(0, estNumResults, GROUP_SIZE):
# update the search parameters using the current offset, then
# make the request to fetch the results
print("[INFO] making request for group {}-{} of {}...".format(
offset, offset + GROUP_SIZE, estNumResults))
params["offset"] = offset
search = requests.get(URL, headers=headers, params=params)
search.raise_for_status()
results = search.json()
print("[INFO] saving images for group {}-{} of {}...".format(
offset, offset + GROUP_SIZE, estNumResults))
# loop over the results
for v in results["value"]:
# try to download the image
try:
# make a request to download the image
print("[INFO] fetching: {}".format(v["contentUrl"]))
r = requests.get(v["contentUrl"], timeout=30)
# build the path to the output image
ext = v["contentUrl"][v["contentUrl"].rfind("."):]
p = os.path.sep.join([args["output"], "{}{}".format(
str(total).zfill(8), ext)])
# write the image to disk
f = open(p, "wb")
f.write(r.content)
f.close()
# catch any errors that would not unable us to download the
# image
except Exception as e:
# check to see if our exception is in our list of
# exceptions to check for
if type(e) in EXCEPTIONS:
print("[INFO] skipping: {}".format(v["contentUrl"]))
continue
# try to load the image from disk
image = cv2.imread(p)
# if the image is `None` then we could not properly load the
# image from disk (so it should be ignored)
if image is None:
print("[INFO] deleting: {}".format(p))
os.remove(p)
continue
# update the counter
total += 1