u-net语义分割_使用U-Net的语义分割

u-net语义分割

Picture By Martei Macru On Unsplash 图片由Martei Macru On Unsplash拍摄

Semantic segmentation is a computer vision problem where we try to assign a class to each pixel . Unlike the classic image classification task where only one class value is predicted(assuming single label classification), in this problem we look for class value for each pixel. The application of image segmentation is predominantly seen in the medical field. However now this is being applied in other domains also e.g self driving car.

语义分割是一个计算机视觉问题,我们尝试为每个像素分配一个类。 与经典图像分类任务不同,在传统图像分类任务中,仅预测一个类别值( 假设使用单个标签分类 ),在此问题中,我们为每个像素寻找类别值。 图像分割的应用主要在医学领域。 但是现在,这也被应用于其他领域,例如自动驾驶汽车。

In case of image classification we are particularly interested to know what is there in the image. Semantic segmentation comes with two wh questions which is what and where.

在图像分类的情况下,我们特别想知道图像中有什么。 语义分割带有两个wh问题,即什么 地方

什么是U-net: (What Is U-net:)

U-Net is the most popular model for semantic segmentation task. Though we have other models to accomplish this task U-Net is widely accepted as the de-facto standard for this task. A typical U-Net architecture has two parts: Encoder and Decoder.

U-Net是最受欢迎的语义分割任务模型。 尽管我们还有其他模型可以完成此任务,但U-Net已被广泛接受为该任务的实际标准。 典型的U-Net架构包含两个部分:编码器和解码器。

Structure Of U-Net U-Net的结构

编码器: (Encoder:)

The job of the encoder is same as any convolutional neural network,which is basically to determine the first wh question what. However when we downsample the image like a typical convnet we tend to lose the information regarding the localization of the segmented objects. The feature maps of the cnn would have learned what is there in the image without any idea of where it is. In the original implementation of the U-Net a 128*18*1 image is taken where the encoder is able to output a 8*8*256 shape.

编码器的工作与任何卷积神经网络相同,基本上是确定第一个问题是什么 。 但是,当像典型的卷积网络那样对图像进行降采样时,我们往往会丢失有关分割对象定位的信息。 cnn的特征图将了解图像中的内容,而无需知道其位置。 在U-Net的原始实现中,会拍摄128 * 18 * 1的图像,其中编码器能够输出8 * 8 * 256的形状。

解码器: (Decoder:)

Decoder tries to recover the lost information during the encoder’s operation on the image. To do so it applies a skip connection which provides the spatial information that was lost during the downsampling of the image. Also the decoder uses transposed convolution which converts the a small image to a larger one. In the decoder size of the image increases from 8*8*256 to 128*128 *1.

解码器尝试在图像上对编码器进行操作期间恢复丢失的信息。 为此,它应用了一个跳过连接,该连接提供了在图像降采样期间丢失的空间信息。 解码器也使用转置卷积,将小图像转换为大图像。 在解码器中,图像的大小从8 * 8 * 256增加到128 * 128 * 1。

U-Net的变化: (Variations In The U-Net:)

We can find variety of implementation of the U-Net architecture. Instead of transposed convolution we can also apply the bilinear sampling method. Similarly if we can replace the encoder convolutional neural network by any popular network like ResNet or VGG-Net. We may or may not choose to use the pretrained weight.

我们可以找到U-Net架构的各种实现。 除了转置卷积,我们还可以应用双线性采样方法。 同样,如果我们可以用任何流行的网络(例如ResNet或VGG-Net)代替编码器卷积神经网络。 我们可能会或可能不会选择使用预先训练的体重。

This was a theoretical overview of the U-Net model using semantic segmentation. In the next blog we can use this model to do salt identification and do the practical implementation of it.

这是使用语义分段的U-Net模型的理论概述。 在下一个博客中,我们可以使用此模型进行盐识别并进行实际实现。

翻译自: https://medium.com/swlh/semantic-segmentation-using-u-net-e0f34e27724f

u-net语义分割

你可能感兴趣的:(python,java)