图像分类

Lecture 2: Image Classification pipeline

Image Classification:a core task in Computer Vision

The Problem:Semantic Gap

This idea of a cat ,or this label of a cat is a semantic label that we're assigning to this image and there's this huge gap between the semantic idea of a cat and these pixel values that the computer is actually seeing.

The challenges:

-ViewPoint variation
-Illumination
-Deformation
-Occlusion
-Backgound Clutter
-Intraclass variation

Method

1.An image claaaifier

Find edges -Find corners

2.Data -Driven Approach

1.Collect a dataset of images and labels
2.Use Machine Learning to train an emage classifier
3.Evalute the classifier on a withheld set of test images


image.png

2.1 First classifier:Neareat Neighbor Classifier

image.png

2.1.1 Distance Metric to compare images

Manhattan distance

image.png

2.2 K-Nearest Neighbors

Instead of copying label from nearest neighbor,take majority vote from K closrt points.

2.2.1 Distance Metric to compare images

image.png

2.2.2 Hyperparameters

The best value of k to use The best distance to use
choices about the algorithm that we set rather than learn

cross validation

Linear Classification

image.png

image.png

Lecture 3: Loss functions an Optimization

Loss Function

quantify for any given value of W


image.png

1 SVM

image.png

1.1 Regularization

image.png
image.png

2 Softmax Classifier

image.png

Optimization

slope

gradient

Image Features


以上都是听课的时候为了集中注意力随便记的,现在开始边看笔记边进行更仔细地理解

笔记是https://zhuanlan.zhihu.com/p/20900216?refer=intelligentunit

图像分类

什么是图像分类问题呢

图像分类问题就是,有一个固定的标签的集合,输入一张图片,分配给该张图片对应的标签。
图像分类问题是计算机视觉中的核心问题之一,很多问题如物体检测和分割等,都可以看成图像分类问题。

举个例子
读取这张图片,并生成该图片属于集合 {cat, dog, hat, mug}中各个标签的概率。
对于计算机来说,图象是一个由数字组成的巨大的3维数组。
图像分类的任务就是,把上百万的数字变成一个简单的标签,比如猫。

image.png

挑战
对于人来说图像分类是非常简单的事情,当你看到一张图片时脑袋里会很自然地生成对应的概念。但是对于计算机来说,这是几百万个三维数组。下面列举一些常见的图像分类的困难。
1.视角变化(Viewpoint variation)
2.大小变化(Scale variation)
3.变形(Deformation)
4.遮挡(Occlusion)
5.光照条件(Illumination)
6.背景干扰(Background clutter)
7.类内差异(Intra-class variation)

如何进行图像分类呢-数据驱动

与排序算法不同,我们很难直接写出一个根据特征识别各类物体的算法,我们采取的是一种类似教小孩子认东西的方式:我们为每一类标签提供大量的示例图片,然后计算机通过学习算法,学习这些数据集,学习到各类物体的外观特征。即数据驱动方法。

image.png

图像分类流程
1.输入(Input):输入训练集Traning set(包含N个图像的集合,每个图像的标签是K种分类标签中的一种)。
2.学习(Learning):用训练集(Training set)来学习每个类的外观特征。这个过程叫训练分类器(Training a classifier),或者学习模型(Learning a model)
3.评价(Evaluation):让训练好的分类器去预测从没见过的图像的分类标签。

下面介绍两种分类器

1. 近邻分类器 Nearest Neighbor Classifier

Nearest Neighor Classifier将测试图片与训练集中的每个图片去比对,将差别最小的那个类的标签,作为预测的结果。

不过从右侧图片,可以看出,准确率不是很高


image.png

如何比较图片?

1.1 L1 distance

即将两张图片的像素矩阵相减,得出的差值矩阵求值,
下图是公式和更为直观的过程


image.png

image.png

代码实现
1将CIFAR-10数据加载到内存中
并分成4个数组:训练数据/标签 测试数据/标签
Xtr:traning set Ytr: traning labels

Xtr, Ytr, Xte, Yte = load_CIFAR10('data/cifar10/') # a magic function we provide
# flatten out all images to be one-dimensional
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3) # Xtr_rows becomes 50000 x 3072
Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3) # Xte_rows becomes 10000 x 3072

2训练并评价分类器
accuracy准确率
train(X,y):使用训练集的数据和标签来进行训练
predict(X):预测输入的新数据的标签

nn = NearestNeighbor() # create a Nearest Neighbor classifier class
nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels
Yte_predict = nn.predict(Xte_rows) # predict labels on the test images
# and now print the classification accuracy, which is the average number
# of examples that are correctly predicted (i.e. label matches)
print 'accuracy: %f' % ( np.mean(Yte_predict == Yte) )

3.Nearest Neighbor 分类器的实现

import numpy as np

class NearestNeighbor(object):
  def __init__(self):
    pass

  def train(self, X, y):
    """ X is N x D where each row is an example. Y is 1-dimension of size N """
    # the nearest neighbor classifier simply remembers all the training data
    self.Xtr = X
    self.ytr = y

  def predict(self, X):
    """ X is N x D where each row is an example we wish to predict label for """
    num_test = X.shape[0]
    # lets make sure that the output type matches the input type
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    # loop over all test rows
    for i in xrange(num_test):
      # find the nearest training image to the i'th test image
      # using the L1 distance (sum of absolute value differences)
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
      min_index = np.argmin(distances) # get the index with smallest distance
      Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

    return Ypred

1.2 L2 distance

公式如下:


image.png

同L1相比,也是计算像素间插值,只是先求平方然后把这些平方全部加起来,对平方和开方

1.3 L1和L2比较

面对图片之间的差异,因为L2采取了平方,放大了差异,所以在面对向量之间的差异的时候,L2比L1更不能容忍差异。

2 K-Nearest Neighbor Classifier

K的意思是,我们不只去选择那一张训练集中最接近的图片,而是选择k张最接近的图片。

下图中
不同的颜色区域代表着决策边界(decision boundaries)
白色代表着分类模糊的区域
可以看出NN classifier 中有很多异常的数据点
5-NN的边缘更平滑,泛化(generalization)能力更好

image.png

2.1用于超参数调优的验证集 Validation sets for Hyperparameter tuning

超参数(Hyperparameter):K-Nearest 分类器中的K,哪一种的距离选择,这类数据都属于超参方法数

2.2调优超参数的方法

不能用测试集调优,可能会出现过拟合

2.2.1 验证集(validation set)

从训练集中取出一部分数据调优。

代码如下:

# assume we have Xtr_rows, Ytr, Xte_rows, Yte as before
# recall Xtr_rows is 50,000 x 3072 matrix
Xval_rows = Xtr_rows[:1000, :] # take first 1000 for validation
Yval = Ytr[:1000]
Xtr_rows = Xtr_rows[1000:, :] # keep last 49,000 for train
Ytr = Ytr[1000:]

# find hyperparameters that work best on the validation set
validation_accuracies = []
for k in [1, 3, 5, 10, 20, 50, 100]:

  # use a particular value of k and evaluation on validation data
  nn = NearestNeighbor()
  nn.train(Xtr_rows, Ytr)
  # here we assume a modified NearestNeighbor class that can take a k as input
  Yval_predict = nn.predict(Xval_rows, k = k)
  acc = np.mean(Yval_predict == Yval)
  print 'accuracy: %f' % (acc,)

  # keep track of what works on the validation set
  validation_accuracies.append((k, acc))
2.2.2交叉验证

如将训练集平均分成五份,每次取一份来验证,4份来训练,然后循环取其他四份来验证,将验证结果的平均值作为结果

2.2.3实际情况

。一般直接把训练集按照50%-90%的比例分成训练集和验证集

你可能感兴趣的:(图像分类)