Lecture 2: Image Classification pipeline

Image Classification:a core task in Computer Vision

The Problem:Semantic Gap

This idea of a cat ,or this label of a cat is a semantic label that we're assigning to this image and there's this huge gap between the semantic idea of a cat and these pixel values that the computer is actually seeing.

The challenges:

-ViewPoint variation
-Illumination
-Deformation
-Occlusion
-Backgound Clutter
-Intraclass variation

Method

1.An image claaaifier

Find edges -Find corners

2.Data -Driven Approach

1.Collect a dataset of images and labels
2.Use Machine Learning to train an emage classifier
3.Evalute the classifier on a withheld set of test images

image.png

2.1 First classifier:Neareat Neighbor Classifier

image.png

2.1.1 Distance Metric to compare images

Manhattan distance

image.png

2.2 K-Nearest Neighbors

Instead of copying label from nearest neighbor,take majority vote from K closrt points.

2.2.1 Distance Metric to compare images

image.png

2.2.2 Hyperparameters

The best value of k to use The best distance to use
choices about the algorithm that we set rather than learn

cross validation

Linear Classification

image.png

Lecture 3: Loss functions an Optimization

Loss Function

quantify for any given value of W

image.png

1 SVM

image.png

1.1 Regularization

image.png

2 Softmax Classifier

image.png

Optimization

slope

gradient

Image Features

以上都是听课的时候为了集中注意力随便记的，现在开始边看笔记边进行更仔细地理解

笔记是https://zhuanlan.zhihu.com/p/20900216?refer=intelligentunit

图像分类

什么是图像分类问题呢

图像分类问题就是，有一个固定的标签的集合，输入一张图片，分配给该张图片对应的标签。
图像分类问题是计算机视觉中的核心问题之一，很多问题如物体检测和分割等，都可以看成图像分类问题。

举个例子
读取这张图片，并生成该图片属于集合 {cat, dog, hat, mug}中各个标签的概率。
对于计算机来说，图象是一个由数字组成的巨大的3维数组。
图像分类的任务就是，把上百万的数字变成一个简单的标签，比如猫。

image.png

挑战
对于人来说图像分类是非常简单的事情，当你看到一张图片时脑袋里会很自然地生成对应的概念。但是对于计算机来说，这是几百万个三维数组。下面列举一些常见的图像分类的困难。
1.视角变化（Viewpoint variation）
2.大小变化（Scale variation）
3.变形（Deformation）
4.遮挡（Occlusion）
5.光照条件（Illumination）
6.背景干扰（Background clutter）
7.类内差异（Intra-class variation）

如何进行图像分类呢-数据驱动

与排序算法不同，我们很难直接写出一个根据特征识别各类物体的算法，我们采取的是一种类似教小孩子认东西的方式：我们为每一类标签提供大量的示例图片，然后计算机通过学习算法，学习这些数据集，学习到各类物体的外观特征。即数据驱动方法。

image.png

图像分类流程
1.输入（Input）:输入训练集Traning set（包含N个图像的集合，每个图像的标签是K种分类标签中的一种）。
2.学习（Learning）：用训练集（Training set）来学习每个类的外观特征。这个过程叫训练分类器（Training a classifier），或者学习模型（Learning a model）。
3.评价（Evaluation）：让训练好的分类器去预测从没见过的图像的分类标签。

下面介绍两种分类器

1. 近邻分类器 Nearest Neighbor Classifier

Nearest Neighor Classifier将测试图片与训练集中的每个图片去比对，将差别最小的那个类的标签，作为预测的结果。

不过从右侧图片，可以看出，准确率不是很高

image.png

如何比较图片？

1.1 L1 distance

即将两张图片的像素矩阵相减，得出的差值矩阵求值，
下图是公式和更为直观的过程

image.png

代码实现
1将CIFAR-10数据加载到内存中
并分成4个数组：训练数据/标签测试数据/标签
Xtr:traning set Ytr: traning labels

Xtr, Ytr, Xte, Yte = load_CIFAR10('data/cifar10/') # a magic function we provide
# flatten out all images to be one-dimensional
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3) # Xtr_rows becomes 50000 x 3072
Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3) # Xte_rows becomes 10000 x 3072

2训练并评价分类器
accuracy准确率
train(X,y)：使用训练集的数据和标签来进行训练
predict(X)：预测输入的新数据的标签

nn = NearestNeighbor() # create a Nearest Neighbor classifier class
nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels
Yte_predict = nn.predict(Xte_rows) # predict labels on the test images
# and now print the classification accuracy, which is the average number
# of examples that are correctly predicted (i.e. label matches)
print 'accuracy: %f' % ( np.mean(Yte_predict == Yte) )

3.Nearest Neighbor 分类器的实现

import numpy as np

class NearestNeighbor(object):
  def __init__(self):
    pass

  def train(self, X, y):
    """ X is N x D where each row is an example. Y is 1-dimension of size N """
    # the nearest neighbor classifier simply remembers all the training data
    self.Xtr = X
    self.ytr = y

  def predict(self, X):
    """ X is N x D where each row is an example we wish to predict label for """
    num_test = X.shape[0]
    # lets make sure that the output type matches the input type
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    # loop over all test rows
    for i in xrange(num_test):
      # find the nearest training image to the i'th test image
      # using the L1 distance (sum of absolute value differences)
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
      min_index = np.argmin(distances) # get the index with smallest distance
      Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

    return Ypred

1.2 L2 distance

公式如下：

image.png

同L1相比，也是计算像素间插值，只是先求平方然后把这些平方全部加起来，对平方和开方

1.3 L1和L2比较

面对图片之间的差异，因为L2采取了平方，放大了差异，所以在面对向量之间的差异的时候，L2比L1更不能容忍差异。

2 K-Nearest Neighbor Classifier

K的意思是，我们不只去选择那一张训练集中最接近的图片，而是选择k张最接近的图片。

下图中
不同的颜色区域代表着决策边界（decision boundaries）
白色代表着分类模糊的区域
可以看出NN classifier 中有很多异常的数据点
5-NN的边缘更平滑，泛化（generalization）能力更好

image.png

2.1用于超参数调优的验证集 Validation sets for Hyperparameter tuning

超参数（Hyperparameter）：K-Nearest 分类器中的K，哪一种的距离选择，这类数据都属于超参方法数

2.2调优超参数的方法

不能用测试集调优，可能会出现过拟合

2.2.1 验证集（validation set）

从训练集中取出一部分数据调优。

代码如下：

# assume we have Xtr_rows, Ytr, Xte_rows, Yte as before
# recall Xtr_rows is 50,000 x 3072 matrix
Xval_rows = Xtr_rows[:1000, :] # take first 1000 for validation
Yval = Ytr[:1000]
Xtr_rows = Xtr_rows[1000:, :] # keep last 49,000 for train
Ytr = Ytr[1000:]

# find hyperparameters that work best on the validation set
validation_accuracies = []
for k in [1, 3, 5, 10, 20, 50, 100]:

  # use a particular value of k and evaluation on validation data
  nn = NearestNeighbor()
  nn.train(Xtr_rows, Ytr)
  # here we assume a modified NearestNeighbor class that can take a k as input
  Yval_predict = nn.predict(Xval_rows, k = k)
  acc = np.mean(Yval_predict == Yval)
  print 'accuracy: %f' % (acc,)

  # keep track of what works on the validation set
  validation_accuracies.append((k, acc))

2.2.2交叉验证

如将训练集平均分成五份，每次取一份来验证，4份来训练，然后循环取其他四份来验证，将验证结果的平均值作为结果

2.2.3实际情况

。一般直接把训练集按照50%-90%的比例分成训练集和验证集

图像分类

Lecture 2: Image Classification pipeline

Image Classification:a core task in Computer Vision

The Problem:Semantic Gap

The challenges:

Method

1.An image claaaifier

2.Data -Driven Approach

2.1 First classifier:Neareat Neighbor Classifier

2.1.1 Distance Metric to compare images

2.2 K-Nearest Neighbors

2.2.1 Distance Metric to compare images

2.2.2 Hyperparameters

Linear Classification

Lecture 3: Loss functions an Optimization

Loss Function

1 SVM

1.1 Regularization

2 Softmax Classifier

Optimization

slope

gradient

Image Features

以上都是听课的时候为了集中注意力随便记的，现在开始边看笔记边进行更仔细地理解

图像分类

什么是图像分类问题呢

如何进行图像分类呢-数据驱动

1. 近邻分类器 Nearest Neighbor Classifier

如何比较图片？

1.1 L1 distance

1.2 L2 distance

1.3 L1和L2比较

2 K-Nearest Neighbor Classifier

2.1用于超参数调优的验证集 Validation sets for Hyperparameter tuning

2.2调优超参数的方法

2.2.1 验证集（validation set）

2.2.2交叉验证

2.2.3实际情况

你可能感兴趣的:(图像分类)