目录
一、概念
二、固定大小的滑动窗口
流程:
代码编写
三、支持向量机与滑动窗口相结合
流程
代码编写
四、保存并加载经过训练的支持向量机
在之前的人脸检测中,我们使用了预训练过的检测器,从而让人脸检测和人员检测等功能变得触手可及。但事实上,我们还需要处理和检测其他很多具体的物体,所以我们应该产生自己的分类器。
有许多流行的方法,本文中,将使用作者提供的方法:依赖于支持向量机和BoW技术。
【原书:OpenCV 4计算机视觉 Python语言实现(原书第三版) 作者:Joseph Howse】
接下来,我们以训练汽车检测器为例:
import cv2
import numpy as np
import os
if not os.path.isdir('CarData'):
print('CarData folder not found. Please download and unzip '
'http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/CarData.tar.gz '
'or https://github.com/gcr/arc-evaluator/raw/master/CarData.tar.gz '
'into the same folder as this script.')
exit(1)
若脚本不打印任何内容,则表示一切正常。
【5】接下来,再脚本中定义以下常量:
BOW_NUM_TRAINING_SAMPLES_PER_CLASS = 10
SVM_NUM_TRAINING_SAMPLES_PER_CLASS = 100
我们的分类器有两个训练阶段:一、用于BoW词表,将使用大量图像作为样本。二、用于支持向量机,将使用大量BoW描述符向量作为样本。
在每个阶段,还可以为两个类(汽车和非汽车)定义不同数量的训练样本。
【6】使用cv2.SIFT提取描述符,并使用cv2.FlannBasedMatcher匹配这些描述符
sift = cv2.SIFT_create()
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = {}
flann = cv2.FlannBasedMatcher(index_params, search_params)
(SIFT 和 FLANN 可翻看之前的笔记文章)
【7】OpenCV提供了名为cv2.BOWKMeansTrainer的类来训练BoW词表,还提供了名为cv2.BOWImgDescriptorExtractor的类来将某种底层描述符(在我们的示例中是SIFT描述符)转换为BoW描述符。我们用下面的代码初始化这些对象:
bow_kmeans_trainer = cv2.BOWKMeansTrainer(40)
bow_extractor = cv2.BOWImgDescriptorExtractor(sift, flann)
在初始化cv2.BOWKMeansTrainer时,必须指定聚类数(在本示例中是40)。在初始化cv2.BOWImgDescriptorExtractor时,必须指定描述符提取器和描述符匹配器(在本示例中分别是之前创建的cv2.SIFT对象和cv2.FlannBasedMatcher对象)。
【8】要训练BoW词表,需要根据各种汽车和非汽车图像提供SIFT描述符的样本。我们将从CarData/TrainImages子文件夹加载图像,其中包含名为pos-x.pgm的正(汽车)图像,以及名为neg-x.pgm的负(非汽车)图像,其中x是从1开始的数字。我们编写以下实用函数来返回到第i个正的和负的训练图像的一对路径,其中i是一个从0开始的数字:
def get_pos_and_neg_paths(i):
pos_path = 'CarData/TrainImages/pos-%d.pgm' % (i+1)
neg_path = 'CarData/TrainImages/neg-%d.pgm' % (i+1)
return pos_path, neg_path
【9】对于每个训练样本的路径,我们需要加载图像、提取SIFT描述符并把描述符添加到BoW训练器中。我们编写另一个实用函数来精确地实现这一任务:
def add_sample(path):
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
keypoints, descriptors = sift.detectAndCompute(img, None)
if descriptors is not None:
bow_kmeans_trainer.add(descriptors)
for i in range(BOW_NUM_TRAINING_SAMPLES_PER_CLASS):
pos_path, neg_path = get_pos_and_neg_paths(i)
add_sample(pos_path)
add_sample(neg_path)
voc = bow_kmeans_trainer.cluster()
bow_extractor.setVocabulary(voc)
我们前面使用SIFT描述符提取器和FLANN匹配器初始化BoW描述符提取器。现在,我们也给BoW描述符提取器一个词表,这个词表是我们用SIFT描述符样本训练的。在这一阶段,BoW 描述符提取器拥有了从高斯差分(Diference of Gaussian,DoG)特征提取BoW描述符所需要的一切。(因为cv2.SIFT检测DoG特征并提取SIFT描述符)
def extract_bow_descriptors(img):
features = sift.detect(img)
return bow_extractor.compute(img, features)
training_data = []
training_labels = []
for i in range(SVM_NUM_TRAINING_SAMPLES_PER_CLASS):
pos_path, neg_path = get_pos_and_neg_paths(i)
pos_img = cv2.imread(pos_path, cv2.IMREAD_GRAYSCALE)
pos_descriptors = extract_bow_descriptors(pos_img)
if pos_descriptors is not None:
training_data.extend(pos_descriptors)
training_labels.append(1)
neg_img = cv2.imread(neg_path, cv2.IMREAD_GRAYSCALE)
neg_descriptors = extract_bow_descriptors(neg_img)
if neg_descriptors is not None:
training_data.extend(neg_descriptors)
training_labels.append(-1)
(如果希望训练分类器来区分多个正类,只需要简单地添加带有标签的其他描述符。例如,我们可以训练一个分类器,它使用标签1表示汽车,2表示人,-1表示背景。没有要求必须有负类或背景类,但是,如果没有负类或背景类,分类器将假定所有内容都属于正类)svm = cv2.ml.SVM_create()
svm.train(np.array(training_data), cv2.ml.ROW_SAMPLE,
np.array(training_labels))
for test_img_path in ['CarData/TestImages/test-0.pgm',
'CarData/TestImages/test-1.pgm',
'C:/MyOpenCV/cascades/woodcutters.jpg',
'E:/115.jpeg']:
img = cv2.imread(test_img_path)
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
descriptors = extract_bow_descriptors(gray_img)
prediction = svm.predict(descriptors)
if prediction[1][0][0] == 1.0:
text = 'car'
color = (0, 255, 0)
else:
text = 'not car'
color = (0, 0, 255)
cv2.putText(img, text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1,
color, 2, cv2.LINE_AA)
cv2.imshow(test_img_path, img)
cv2.waitKey(0)
import cv2
import numpy as np
import os
if not os.path.isdir('CarData'):
print('CarData folder not found. Please download and unzip '
'http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/CarData.tar.gz '
'or https://github.com/gcr/arc-evaluator/raw/master/CarData.tar.gz '
'into the same folder as this script.')
exit(1)
BOW_NUM_TRAINING_SAMPLES_PER_CLASS = 10
SVM_NUM_TRAINING_SAMPLES_PER_CLASS = 100
sift = cv2.SIFT_create()
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = {}
flann = cv2.FlannBasedMatcher(index_params, search_params)
bow_kmeans_trainer = cv2.BOWKMeansTrainer(40)
bow_extractor = cv2.BOWImgDescriptorExtractor(sift, flann)
def get_pos_and_neg_paths(i):
pos_path = 'CarData/TrainImages/pos-%d.pgm' % (i+1)
neg_path = 'CarData/TrainImages/neg-%d.pgm' % (i+1)
return pos_path, neg_path
def add_sample(path):
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
keypoints, descriptors = sift.detectAndCompute(img, None)
if descriptors is not None:
bow_kmeans_trainer.add(descriptors)
for i in range(BOW_NUM_TRAINING_SAMPLES_PER_CLASS):
pos_path, neg_path = get_pos_and_neg_paths(i)
add_sample(pos_path)
add_sample(neg_path)
voc = bow_kmeans_trainer.cluster()
bow_extractor.setVocabulary(voc)
def extract_bow_descriptors(img):
features = sift.detect(img)
return bow_extractor.compute(img, features)
training_data = []
training_labels = []
for i in range(SVM_NUM_TRAINING_SAMPLES_PER_CLASS):
pos_path, neg_path = get_pos_and_neg_paths(i)
pos_img = cv2.imread(pos_path, cv2.IMREAD_GRAYSCALE)
pos_descriptors = extract_bow_descriptors(pos_img)
if pos_descriptors is not None:
training_data.extend(pos_descriptors)
training_labels.append(1)
neg_img = cv2.imread(neg_path, cv2.IMREAD_GRAYSCALE)
neg_descriptors = extract_bow_descriptors(neg_img)
if neg_descriptors is not None:
training_data.extend(neg_descriptors)
training_labels.append(-1)
svm = cv2.ml.SVM_create()
svm.train(np.array(training_data), cv2.ml.ROW_SAMPLE,
np.array(training_labels))
for test_img_path in ['CarData/TestImages/test-0.pgm',
'CarData/TestImages/test-1.pgm',
'C:/MyOpenCV/cascades/woodcutters.jpg',
'E:/115.jpeg']:
img = cv2.imread(test_img_path)
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
descriptors = extract_bow_descriptors(gray_img)
prediction = svm.predict(descriptors)
if prediction[1][0][0] == 1.0:
text = 'car'
color = (0, 255, 0)
else:
text = 'not car'
color = (0, 0, 255)
cv2.putText(img, text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1,
color, 2, cv2.LINE_AA)
cv2.imshow(test_img_path, img)
cv2.waitKey(0)
4张图像的测试结果:
目前,我们使用了SIFT、BoW 和 支持向量机训练了一个分类器,以区分两个类:汽车和非汽车。我们已将该分类器应用于整个图像,下个步骤是应用滑动窗口技术,将分类结果缩小到图像的特定区域。
通过把支持向量机(SVM)分类器与滑动窗口技术和图像金字塔相结合,我们可以实现下列改进:
我们将采用以下方法:
(1)取图像的一个区域,对其进行分类,按照预定义的步长把窗口移动到右侧。当到达图像最右端时,将x坐标重置为0,向下移动一步,重复整个过程。
(2)在每一步,使用经BoW训练的SVM执行分类。
(3)根据SVM,持续跟踪正检测的所有窗口。
(4)在对完整图像中每个窗口分类之后,缩小图像,并利用滑动窗口重复整个过程。
因此,我们使用的是图像金字塔。继续缩小并分类直到到达最小尺度。
同时,我们将使用NMS过滤结果,NMS将依赖于Malisiewicz和Rosebrock 的实现,实现的副本在GitHub中:Learning-OpenCV-4-Computer-Vision-with-Python-Third-Edition/non_max_suppression.py at master · PacktPublishing/Learning-OpenCV-4-Computer-Vision-with-Python-Third-Edition · GitHub
该脚本提供了一个具有以下签名的函数:
def non_max_suppression_fast(boxes, overlapThresh):
该函数接受包含矩形坐标和得分的NumPy数组作为第1个参数。如果有N个矩形,数组的形状就是Nx5。对于索引 i 处的给定矩形,数组中的值有以下含义:
该函数接受一个阈值(代表矩形之间重叠的最大比例)作为第2个参数。如果两个矩形的重叠比例大于这个参数,将会过滤掉较低的得分结果。最后,该函数将返回由剩余矩形组成的数组。
(该脚本为上一个脚本的修改与扩展)
import cv2
import numpy as np
import os
from non_max_suppression import non_max_suppression_fast as nms
BOW_NUM_TRAINING_SAMPLES_PER_CLASS = 10
SVM_NUM_TRAINING_SAMPLES_PER_CLASS = 100
SVM_SCORE_THRESHOLD = 1.8
NMS_OVERLAP_THRESHOLD = 0.15
我们将使用SVM_SCORE_THRESHOLD作为阈值来区分正窗口和负窗口,使用NMS_OVERLAP_THRESHOLD作为NMS步骤中可接受的最大重叠比例,这里,我们随机选择15%,所以将剔除重叠超过该比例的窗口。在用支持向量机做实验时,可以根据自己的喜好调整这些参数,直到找到能在应用程序中产生最佳结果的值。bow_kmeans_trainer = cv2.BOWKMeansTrainer(12)
svm = cv2.ml.SVM_create()
svm.setType(cv2.ml.SVM_C_SVC)
svm.setC(50)
svm.train(np.array(training_data), cv2.ml.ROW_SAMPLE,
np.array(training_labels))
def pyramid(img, scale_factor=1.25, min_size=(200, 80),
max_size=(600, 600)):
h, w = img.shape
min_w, min_h = min_size
max_w, max_h = max_size
while w >= min_w and h >= min_h:
if w <= max_w and h <= max_h:
yield img
w /= scale_factor
h /= scale_factor
img = cv2.resize(img, (int(w), int(h)),
interpolation=cv2.INTER_AREA)
此函数将获取一幅图像并生成一系列调整大小的图像版本,但有最大和最小限制。
def sliding_window(img, step=20, window_size=(100, 40)):
img_h, img_w = img.shape
window_w, window_h = window_size
for y in range(0, img_w, step):
for x in range(0, img_h, step):
roi = img[y:y+window_h, x:x+window_w]
roi_h, roi_w = roi.shape
if roi_w == window_w and roi_h == window_h:
yield (x, y, roi)
这段代码机制很简单:给定一幅图像,返回左上角坐标和代表下一个窗口的子图像。连续的窗口通过任意大小的步长从左到右移动,直到到达图像的最右端,并从上到下移动,直到到达图像的底端。for test_img_path in ['CarData/TestImages/test-0.pgm',
'CarData/TestImages/test-1.pgm',
'C:/MyOpenCV/cascades/woodcutters.jpg',
'E:/115.jpeg']:
img = cv2.imread(test_img_path)
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
pos_rects = []
for resized in pyramid(gray_img):
for x, y, roi in sliding_window(resized):
descriptors = extract_bow_descriptors(roi)
if descriptors is None:
continue
prediction = svm.predict(descriptors)
if prediction[1][0][0] == 1.0:
raw_prediction = svm.predict(
descriptors, flags=cv2.ml.STAT_MODEL_RAW_OUTPUT)
score = -raw_prediction[1][0][0]
if score > SVM_SCORE_THRESHOLD:
h, w = roi.shape
scale = gray_img.shape[0] / float(resized.shape[0])
pos_rects.append([int(x * scale),
int(y * scale),
int((x+w) * scale),
int((y+h) * scale),
score])
pos_rects = nms(np.array(pos_rects), NMS_OVERLAP_THRESHOLD)
for x0, y0, x1, y1, score in pos_rects:
cv2.rectangle(img, (int(x0), int(y0)), (int(x1), int(y1)),
(0, 255, 255), 2)
text = '%.2f' % score
cv2.putText(img, text, (int(x0), int(y0) - 20),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 2)
import cv2
import numpy as np
import os
from non_max_suppression import non_max_suppression_fast as nms
if not os.path.isdir('CarData'):
print('CarData folder not found. Please download and unzip '
'http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/CarData.tar.gz '
'or https://github.com/gcr/arc-evaluator/raw/master/CarData.tar.gz '
'into the same folder as this script.')
exit(1)
BOW_NUM_TRAINING_SAMPLES_PER_CLASS = 10
SVM_NUM_TRAINING_SAMPLES_PER_CLASS = 100
SVM_SCORE_THRESHOLD = 1.8
NMS_OVERLAP_THRESHOLD = 0.15
sift = cv2.SIFT_create()
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = {}
flann = cv2.FlannBasedMatcher(index_params, search_params)
bow_kmeans_trainer = cv2.BOWKMeansTrainer(12)
bow_extractor = cv2.BOWImgDescriptorExtractor(sift, flann)
def get_pos_and_neg_paths(i):
pos_path = 'CarData/TrainImages/pos-%d.pgm' % (i+1)
neg_path = 'CarData/TrainImages/neg-%d.pgm' % (i+1)
return pos_path, neg_path
def add_sample(path):
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
keypoints, descriptors = sift.detectAndCompute(img, None)
if descriptors is not None:
bow_kmeans_trainer.add(descriptors)
for i in range(BOW_NUM_TRAINING_SAMPLES_PER_CLASS):
pos_path, neg_path = get_pos_and_neg_paths(i)
add_sample(pos_path)
add_sample(neg_path)
voc = bow_kmeans_trainer.cluster()
bow_extractor.setVocabulary(voc)
def extract_bow_descriptors(img):
features = sift.detect(img)
return bow_extractor.compute(img, features)
training_data = []
training_labels = []
for i in range(SVM_NUM_TRAINING_SAMPLES_PER_CLASS):
pos_path, neg_path = get_pos_and_neg_paths(i)
pos_img = cv2.imread(pos_path, cv2.IMREAD_GRAYSCALE)
pos_descriptors = extract_bow_descriptors(pos_img)
if pos_descriptors is not None:
training_data.extend(pos_descriptors)
training_labels.append(1)
neg_img = cv2.imread(neg_path, cv2.IMREAD_GRAYSCALE)
neg_descriptors = extract_bow_descriptors(neg_img)
if neg_descriptors is not None:
training_data.extend(neg_descriptors)
training_labels.append(-1)
svm = cv2.ml.SVM_create()
svm.setType(cv2.ml.SVM_C_SVC)
svm.setC(50)
svm.train(np.array(training_data), cv2.ml.ROW_SAMPLE,
np.array(training_labels))
def pyramid(img, scale_factor=1.25, min_size=(200, 80),
max_size=(600, 600)):
h, w = img.shape[:2]
min_w, min_h = min_size
max_w, max_h = max_size
while w >= min_w and h >= min_h:
if w <= max_w and h <= max_h:
yield img
w /= scale_factor
h /= scale_factor
img = cv2.resize(img, (int(w), int(h)),
interpolation=cv2.INTER_AREA)
def sliding_window(img, step=20, window_size=(100, 40)):
img_h, img_w = img.shape
window_w, window_h = window_size
for y in range(0, img_w, step):
for x in range(0, img_h, step):
roi = img[y:y+window_h, x:x+window_w]
roi_h, roi_w = roi.shape
if roi_w == window_w and roi_h == window_h:
yield (x, y, roi)
for test_img_path in ['CarData/TestImages/test-0.pgm',
'CarData/TestImages/test-1.pgm',
'C:/MyOpenCV/cascades/woodcutters.jpg',
'E:/115.jpeg']:
img = cv2.imread(test_img_path)
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
pos_rects = []
for resized in pyramid(gray_img):
for x, y, roi in sliding_window(resized):
descriptors = extract_bow_descriptors(roi)
if descriptors is None:
continue
prediction = svm.predict(descriptors)
if prediction[1][0][0] == 1.0:
raw_prediction = svm.predict(
descriptors, flags=cv2.ml.STAT_MODEL_RAW_OUTPUT)
score = -raw_prediction[1][0][0]
if score > SVM_SCORE_THRESHOLD:
h, w = roi.shape
scale = gray_img.shape[0] / float(resized.shape[0])
pos_rects.append([int(x * scale),
int(y * scale),
int((x+w) * scale),
int((y+h) * scale),
score])
pos_rects = nms(np.array(pos_rects), NMS_OVERLAP_THRESHOLD)
for x0, y0, x1, y1, score in pos_rects:
cv2.rectangle(img, (int(x0), int(y0)), (int(x1), int(y1)),
(0, 255, 255), 2)
text = '%.2f' % score
cv2.putText(img, text, (int(x0), int(y0) - 20),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 2)
cv2.imshow(test_img_path, img)
cv2.waitKey(0)
4张图像检测结果:
3辆汽车中,成功检测到两辆,有1辆未检测出。
(本示例中,训练集很小,但训练集越大,背景就越多样化,准确性就越高)。
最后一条建议:你不需要在每次使用检测器时都对其进行训练,实际上,你应该避免这样做,因为训练很慢。使用如下代码可以把训练好的SVM模型保存到XML文件:
在原脚本1处:
svm = cv2.ml.SVM_create()
svm.train(np.array(training_data), cv2.ml.ROW_SAMPLE,
np.array(training_labels))
改为:
svm = cv2.ml.SVM_create()
svm.train(np.array(training_data), cv2.ml.ROW_SAMPLE,
np.array(training_labels))
svm = cv2.ml.SVM_load('my_svm.xml')
加载SVM:
svm = cv2.ml.SVM_load('my_svm.xml')
通常,我们需要一个脚本来训练和保存SVM模型,用其他脚本加载并使用SVM来解决各种检测问题。
注意:有老版本的加载方式可能为:
svm = cv2.ml.SVM_create()
svm.load('my_svm.xml')
结果会报错:error: (-215:Assertion failed) samples.cols == var_count && samples.type() == CV_32F in function 'cv::ml::SVMImpl::predict'
使用上面的加载方式即可。
【参考】:
原书:OpenCV 4计算机视觉 Python语言实现(原书第三版) 作者:Joseph Howse】