While a three-year-old child has a lot to learn about the world, one thing that he is already an expert in is making sense of what he sees. Our society is more technologically advanced than ever. We’ve sent people to the moon, have phones that can talk to us, and have radio stations that can be customized to play the music of our choice. Yet our most advanced machines still struggle at interpreting what it sees.
虽然三岁的孩子对世界有很多了解,但他已经是专家的一件事就是了解他所看到的东西。 我们的社会比以往任何时候都技术先进。 我们已将人们送上月球,拥有可以与我们通话的电话,还可以定制广播电台来播放我们选择的音乐。 然而,我们最先进的机器仍在努力解释其所见。
We have prototype cars that can drive for us, but they cannot differentiate between a crumbled paper bag on the road and a stone that should be avoided. We have fabulous megapixel cameras, but we have not delivered sight to the blind. Security cameras are everywhere but they cannot detect when a child is drowning in the swimming pool.
我们有可以为我们开车的原型车,但它们无法区分道路上的碎纸袋和应避免的石头。 我们有出色的百万像素相机,但我们还没有为盲人带来视线。 安全摄像头无处不在,但它们无法检测到儿童何时被淹死在游泳池中。
As a society, we are collectively still blind when our machines are blind.
作为一个社会,当我们的机器失明时,我们集体仍然失明。
什么是计算机视觉? (What is Computer Vision?)
According to Prof. Fei-Fei Li, computer vision is defined as “a subset of mainstream artificial intelligence that deals with the science of making computers or machines visually enabled, i.e., they can analyze and understand an image.” Human vision starts at the biological camera’s “eyes,” which takes one picture about every 200 milliseconds, while computer vision starts by providing input to the machine. This makes it the best case for a class of algorithms called the Convolution Neural Network.
根据李飞飞教授的说法,计算机视觉被定义为“主流人工智能的子集,涉及使计算机或机器具有视觉功能的科学,即它们可以分析和理解图像。” 人类的视觉始于生物照相机的“眼睛”,大约每200毫秒拍摄一张照片,而计算机的视觉始于向机器提供输入。 对于称为卷积神经网络的一类算法,这是最好的情况。
The basic building block of a neural network is a neuron, which loosely models the biological neuron. Similar to a biological neuron, an artificial neuron has input channels, a processing body, and an output channel as shown in Figure 1.
神经网络的基本构建模块是神经元,它可以对生物神经元进行松散建模。 类似于生物神经元,人工神经元具有输入通道,处理体和输出通道,如图1所示。
Just like the biological brain, these neuron-like nodes are connected in a way that receives input from one node and sends output to other nodes as shown in Figure 2.
就像生物大脑一样,这些类似神经元的节点以一种从一个节点接收输入并将输出发送到其他节点的方式连接,如图2所示。
When hundreds or thousands of these nodes are organized in the same fashion as neurons in the biological brain, they form an Artificial Neural Network. In practice, these networks are so huge that they end up having billions of parameters, millions of nodes, and trillions of connections between them, resulting in a humongous model.
当成百上千的这些节点以与生物大脑中神经元相同的方式组织时,它们便形成了人工神经网络。 实际上,这些网络是如此之大,以至于最终它们拥有数十亿个参数,数百万个节点以及它们之间的数万亿个连接,从而形成了庞大的模型。
Computer vision enters the picture when we want to feed an image as an input with the intent that our machine will derive some intelligence out of it.
当我们想提供图像作为输入时,计算机视觉会进入图像,以使我们的机器从中获得一些智能。
卷积神经网络 (Convolutional Neural Network)
Convolutional Neural Network is a class of deep feedforward neural networks (Figure 4) that is largely inspired by the biological system, where the connectivity pattern between neurons depicts where each individual cortical neuron responds to stimuli only in the restricted region of the visual field known as the receptive field, i.e., a restrictive subarea of the input. The cortical neurons of different fields overlap in such a way that they collectively represent the entire image.
卷积神经网络是一类深层前馈神经网络(图4),其主要受生物系统的启发,其中神经元之间的连通性模式描述了每个单独的皮质神经元仅在视野受限的区域内对刺激做出响应的位置。接受域,即输入的限制性子区域。 不同区域的皮层神经元以它们共同代表整个图像的方式重叠。
In a Convolution Neural Network, each convolution neuron processes data only for its receptive field, and they are organized in such a way that they collectively also represent the entire image. Moreover, both the biological visual system and CNN have a hierarchy of layers that progressively extract more and more features. These layers are arranged in increasing order of complexity, starting from simple visual representations such as edges, lines, curves, etc., and gradually more complex representations such as faces, instances, etc. This results in the ability to understand complex images.
在卷积神经网络中,每个卷积神经元仅针对其接收场处理数据,并且以这样的方式组织它们:它们共同也代表整个图像。 此外,生物视觉系统和CNN都具有层次结构,这些层次结构逐渐提取出越来越多的特征。 这些层以复杂度从高到低的顺序排列,从简单的视觉表示(例如边缘,线条,曲线等)开始,逐渐从更复杂的表示(例如面部,实例等)开始。这导致人们能够理解复杂的图像。
The architecture of feedforward neural networks looks something like this:
前馈神经网络的架构如下所示:
Typically, a Convolution Neural Network has the following layers:
卷积神经网络通常具有以下几层:
1. Convolutional Layer or Conv Layer
1.卷积层或卷积层
The convolutional layer applies the convolution operation upon the input, passing the result to the next layer. Each convolution operation emits the response of an individual neuron for its receptive field only.
卷积层在输入上应用卷积运算,将结果传递到下一层。 每个卷积操作仅针对其接收场发出单个神经元的响应。
2. Pooling Layer or Pool Layer
2.池层或池层
The pooling layer is introduced to reduce the spatial size of the output produced by the conv layer. This layer is introduced to detect the higher-level details from the input that is composed of lower-level building blocks, e.g., detecting corners from the intersection of two edges.
引入池化层可减小转换层产生的输出的空间大小。 引入该层是为了从由较低级别的构建块组成的输入中检测较高级别的详细信息,例如,从两条边的交点检测拐角。
3. Fully Connected Layer or FC Layer
3.全连接层或FC层
The fully connected layer is the layer in which every node is connected to every node in its preceding and succeeding layer as shown in Figure 4. The primary purpose of the above two layers is to extract information out of an image. The fully connected layer then maps the extracted information to the respected output.
完全连接层是其中每个节点都与其上一层和下一层中的每个节点连接的层,如图4所示。以上两层的主要目的是从图像中提取信息。 然后,全连接层将提取的信息映射到受关注的输出。
Apart from the above layers, CNNs can also have other components like a batch normalization layer, dropout, etc. These components can add a dazzling effect to the convolution neural network model in such a way that each of the layers is composed of learnable weights in which we need to initialize in the training process. A batch norm layer alleviates a lot of headaches with properly initializing neural networks by explicitly forcing the activations throughout a network to take on a unit gaussian distribution at the beginning of the training. In a similar way, dropout is an extremely effective and simple regularization technique that keeps only a few neurons active with some probability p.
除了上述各层之外,CNN还可以具有其他组件,例如批处理规范化层,退出等。这些组件可以使卷积神经网络模型增加令人眼花effect乱的效果,使得每一层都由可学习的权重组成。我们需要在训练过程中进行初始化。 批处理规范层通过在训练开始时显式强制整个网络中的激活呈现单位高斯分布来正确初始化神经网络,从而减轻了很多麻烦。 以类似的方式,辍学是一种非常有效且简单的正则化技术,它仅使少数神经元保持活动状态,概率为p。
The three main layers are stacked on top of each other so that the CNN architecture looks like the following:
三个主要层相互堆叠,因此CNN架构如下所示:
计算机视觉应用 (Computer Vision Applications)
There are many computer vision applications out in the market. Below are just a few:
市场上有许多计算机视觉应用程序。 以下是一些:
- Automatic inspection (image-based automated inspection), e.g., in manufacturing applications 自动检查(基于图像的自动检查),例如在制造应用程序中
Assisting humans in identification tasks (to identify object/species using their properties), e.g., a species identification system
协助人类进行识别任务(使用其属性识别对象/物种),例如物种识别系统
Controlling processes (in a way of monitoring robots), e.g., an industrial robot
控制过程(以监视机器人的方式),例如工业机器人
Detecting events, e.g., for visual surveillance or people counting
检测事件,例如用于视觉监视或人数统计
- Modeling objects or environments (using drones can analyze about climatic factors that leads to change in vegetation, etc.), e.g., medical image analysis or topographical modeling 建模对象或环境(使用无人机可以分析导致植被变化的气候因素等),例如医学图像分析或地形建模
Navigation, e.g., by an autonomous vehicle or mobile robot
导航,例如通过自动驾驶汽车或移动机器人
Organizing information, e.g., for indexing databases of images and image sequences
组织信息, 例如用于索引图像和图像序列的数据库
如何思考计算机视觉应用 (How to think about a Computer Vision Application)
We can think of a computer vision application as finding tasks that require human vision expertise and deriving some patterns out of it. We can also think that if a task can be automated, then we can work on developing a computer vision application.
我们可以将计算机视觉应用程序视为寻找需要人类视觉专业知识的任务,并从中得出一些模式。 我们还可以认为,如果一项任务可以自动化,那么我们就可以开发计算机视觉应用程序。
We can think of a computer vision application by keeping the following points in mind:
我们可以通过记住以下几点来考虑计算机视觉应用程序:
- Adapt Existing Jobs and Look for Modification: Looking at the existing jobs for inspiration, we can devise a computer vision-based solution, e.g., computer vision can be used to detect the vehicles that break the traffic rules, read the number, and generate a fine slip for it. We can also look for already existing applications that are facing some problems and search for a better solution. 适应现有工作并寻求修改:通过寻找现有工作以寻找灵感,我们可以设计一种基于计算机视觉的解决方案,例如,计算机视觉可用于检测违反交通规则的车辆,读取数量并生成车辆。细滑。 我们还可以寻找面临一些问题的现有应用程序,并寻求更好的解决方案。
- Brainstorm: We can brainstorm with our colleagues, friends, and family to gather problems and check to see if they can be solved using computer vision. 头脑风暴:我们可以与同事,朋友和家人进行头脑风暴,收集问题并检查是否可以使用计算机视觉解决问题。
- Research: Everything will ultimately boil down to research. There is no escaping research when you are looking for ideas. The research will not only help you get new app ideas but will also help you explore the market for already existing applications. 研究:一切最终都归结为研究。 当您寻找想法时,没有逃避的研究。 这项研究不仅可以帮助您获得新的应用程序构想,还可以帮助您探索现有应用程序的市场。
翻译自: https://medium.com/@mayank.skb/computer-vision-in-artificial-intelligence-ddd58ebbc70