利用深度学习进行交通灯识别
by David Brailovsky
戴维·布雷洛夫斯基(David Brailovsky)
I recently won first place in the Nexar Traffic Light Recognition Challenge, computer vision competition organized by a company that’s building an AI dash cam app.
我最近在Nexar交通灯识别挑战赛中获得了第一名,该挑战赛由一家正在构建AI dash cam应用程序的公司组织的计算机视觉竞赛。
In this post, I’ll describe the solution I used. I’ll also explore approaches that did and did not work in my effort to improve my model.
在这篇文章中,我将描述我使用的解决方案。 我还将探索在改进模型方面有效的方法和无效的方法。
Don’t worry — you don’t need to be an AI expert to understand this post. I’ll focus on the ideas and methods I used as opposed to the technical implementation.
不用担心-您无需成为AI专家即可了解此文章。 我将重点介绍与技术实现相反的想法和方法。
The goal of the challenge was to recognize the traffic light state in images taken by drivers using the Nexar app. In any given image, the classifier needed to output whether there was a traffic light in the scene, and whether it was red or green. More specifically, it should only identify traffic lights in the driving direction.
挑战的目标是识别驾驶员使用Nexar应用程序拍摄的图像中的交通灯状态。 在任何给定图像中,分类器都需要输出场景中是否有交通信号灯,以及它是红色还是绿色。 更具体地说,它应仅识别行驶方向上的交通信号灯。
Here are a few examples to make it clearer:
这里有一些例子可以使它更清楚:
The images above are examples of the three possible classes I needed to predict: no traffic light (left), red traffic light (center) and green traffic light (right).
上面的图像是我需要预测的三种可能类别的示例:无交通灯(左),红色交通灯(中)和绿色交通灯(右)。
The challenge required the solution to be based on Convolutional Neural Networks, a very popular method used in image recognition with deep neural networks. The submissions were scored based on the model’s accuracy along with the model’s size (in megabytes). Smaller models got higher scores. In addition, the minimum accuracy required to win was 95%.
挑战要求解决方案基于卷积神经网络 ,这是一种在深层神经网络的图像识别中非常流行的方法。 根据模型的准确性以及模型的大小(以兆字节为单位)对提交的内容进行评分。 较小的模型得分较高。 此外,获胜所需的最低准确性为95%。
Nexar provided 18,659 labeled images as training data. Each image was labeled with one of the three classes mentioned above (no traffic light / red / green).
Nexar提供了18659张标签图像作为训练数据。 每个图像都标记有上述三个类别之一(无交通信号灯/红色/绿色)。
I used Caffe to train the models. The main reason I chose Caffe was because of the large variety of pre-trained models.
我用Caffe训练模型。 我选择Caffe的主要原因是因为有各种各样的预训练模型。
Python, NumPy & Jupyter Notebook were used for analyzing results, data exploration and ad-hoc scripts.
Python,NumPy和Jupyter Notebook用于分析结果,数据浏览和即席脚本。
Amazon’s GPU instances (g2.2xlarge) were used to train the models. My AWS bill ended up being $263 (!). Not cheap. ?
Amazon的GPU实例(g2.2xlarge)用于训练模型。 我的AWS账单最终变成了263美元 (!)。 不便宜。 ?
The code and files I used to train and run the model are on GitHub.
我用于训练和运行模型的代码和文件位于GitHub上 。
The final classifier achieved an accuracy of 94.955% on Nexar’s test set, with a model size of ~7.84 MB. To compare, GoogLeNet uses a model size of 41 MB, and VGG-16 uses a model size of 528 MB.
最终分类器在Nexar测试集上的准确度达到94.955% ,模型大小约为7.84 MB 。 作为比较, GoogLeNet使用的模型大小为41 MB,而VGG-16使用的模型大小为528 MB。
Nexar was kind enough to accept 94.955% as 95% to pass the minimum requirement ?.
耐克萨斯(Nexar)愿意接受94.955%作为95%,以通过最低要求?。
The process of getting higher accuracy involved a LOT of trial and error. Some of it had some logic behind it, and some was just “maybe this will work”. I’ll describe some of the things I tried to improve the model that did and didn’t help. The final classifier details are described right after.
获得更高准确性的过程涉及大量的反复试验。 其中一些背后有一些逻辑,而有些只是“也许会行得通”。 我将描述一些我试图改进模型的事情,这些模型确实有帮助,但没有帮助。 最后的分类器详细信息将在后面描述。