将CVAT XML文件中的注释提取到Python中的遮罩文件中

The Computer Vision Annotation Tool (CVAT) is a well-known image annotation tool. The results of the data labelers work can be imported in XML file. This XML file contains all the necessary information about the markup. However, for image segmentation task it is necessary to have masks in the form of image files (JPEG, GIF, PNG, etc.). In other words, having the markup coordinates in the CVAT XML file, you need to draw the corresponding masks.

计算机视觉注释工具(CVAT)是众所周知的图像注释工具。 数据标记器工作的结果可以导入XML文件中。 该XML文件包含有关标记的所有必要信息。 但是,对于图像分割任务,必须具有图像文件形式的蒙版(JPEG,GIF,PNG等)。 换句话说,在CVAT XML文件中具有标记坐标,您需要绘制相应的蒙版。

If the data labelers worked with images in a higher resolution than it is supposed to be used for training, then the task will become more complicated. It is necessary to take into account the influence of the image compression factor on the numerical values of the cue points presented in the XML file.

如果数据标签处理器使用的图像分辨率比用于训练的分辨率高,那么任务将变得更加复杂。 有必要考虑图像压缩因子对XML文件中提示点的数值的影响。

All code for extracting annotations is implemented as a script in Python. The lxml library is used for parsing XML. It is a fast and flexible solution for handling XML and HTML markup The lxml package has XPath and XSLT support, including an API for SAX and an API for compatibility with C modules.

用于提取注释的所有代码均在Python中作为脚本实现。 lxml库用于解析XML。 这是用于处理XML和HTML标记的快速,灵活的解决方案。lxml软件包具有XPath和XSLT支持,包括用于SAX的API和与C模块兼容的API。

The tqdm package is used as a progress bar to illustrate the processing of a large number of files.

tqdm程序包用作进度条,以说明处理大量文件的过程。

Let’s take a closer look. Import libraries:

让我们仔细看看。 导入库:

import os
import cv2
import argparse
import shutil
import numpy as np
from lxml import etree
from tqdm import tqdm

A useful function for creating a new directory and recursively deleting the contents of an existing one:

创建新目录并递归删除现有目录内容的有用功能:

def dir_create(path):
if (os.path.exists(path)) and (os.listdir(path) != []):
shutil.rmtree(path)
os.makedirs(path) if not os.path.exists(path):
os.makedirs(path)

The arguments for the script in question are the following data: directory with input images, input file with CVAT annotation in XML format, directory for output masks and scale factor for images. A function for parsing arguments from the command line:

该脚本的参数是以下数据:输入图像的目录,带有XML格式的CVAT批注的输入文件,输出蒙版的目录和图像的比例因子。 从命令行解析参数的函数:

def parse_args():
parser = argparse.ArgumentParser(
fromfile_prefix_chars='@',
description='Convert CVAT XML annotations to contours'
)parser.add_argument(
'--image-dir', metavar='DIRECTORY', required=True,
help='directory with input images'
)parser.add_argument(
'--cvat-xml', metavar='FILE', required=True,
help='input file with CVAT annotation in xml format'
)parser.add_argument(
'--output-dir', metavar='DIRECTORY', required=True,
help='directory for output masks'
)parser.add_argument(
'--scale-factor', type=float, default=1.0,
help='choose scale factor for images'
)return parser.parse_args()

For understanding how the extracting function works, let’s take a closer look at the section of the CVAT XML file:

为了了解提取功能的工作原理,让我们仔细看看CVAT XML文件的这一部分:










...

At first, it is necessary to find in the XML file the area corresponding to the currently processed image. The easiest way to do this is by the file name (‘7.jpg’ in the example). Next, you need to find the tags ‘polygon’ or ‘box’ and extract the necessary data from them (in this example, roofs are marked on the basis of polygons). You can use the following function to obtain markup results from CVAT XML:

首先,有必要在XML文件中找到与当前处理的图像相对应的区域。 最简单的方法是通过文件名(示例中为“ 7.jpg ”)。 接下来,您需要找到标签' polygon '或' box '并从中提取必要的数据(在此示例中,根据多边形来标记屋顶)。 您可以使用以下函数从CVAT XML获取标记结果:

def parse_anno_file(cvat_xml, image_name):
root = etree.parse(cvat_xml).getroot()
anno = []image_name_attr = ".//image[@name='{}']".format(image_name)for image_tag in root.iterfind(image_name_attr):
image = {}
for key, value in image_tag.items():
image[key] = value
image['shapes'] = []
for poly_tag in image_tag.iter('polygon'):
polygon = {'type': 'polygon'}
for key, value in poly_tag.items():
polygon[key] = value
image['shapes'].append(polygon)
for box_tag in image_tag.iter('box'):
box = {'type': 'box'}
for key, value in box_tag.items():
box[key] = value
box['points'] = "{0},{1};{2},{1};{2},{3};{0},{3}".format(
box['xtl'], box['ytl'], box['xbr'], box['ybr'])
image['shapes'].append(box)
image['shapes'].sort(key=lambda x: int(x.get('z_order', 0)))
anno.append(image)
return anno

Next, we need to create mask files. Draw the sides of the mask polygons in white, and the inner content in red (as shown in the picture above). The following function allows you to do this:

接下来,我们需要创建遮罩文件。 用白色绘制蒙版多边形的边,用红色绘制内部内容(如上图所示)。 使用以下功能可以执行此操作:

def create_mask_file(width, height, bitness, background, shapes, scale_factor):
mask = np.full((height, width, bitness // 8), background, dtype=np.uint8)
for shape in shapes:
points = [tuple(map(float, p.split(','))) for p in shape['points'].split(';')]
points = np.array([(int(p[0]), int(p[1])) for p in points])
points = points*scale_factor
points = points.astype(int)
mask = cv2.drawContours(mask, [points], -1, color=(255, 255, 255), thickness=5)
mask = cv2.fillPoly(mask, [points], color=(0, 0, 255))
return mask

And in the end, the main function:

最后,主要功能是:

def main():
args = parse_args()
dir_create(args.output_dir)
img_list = [f for f in os.listdir(args.image_dir) if os.path.isfile(os.path.join(args.image_dir, f))]
mask_bitness = 24
for img in tqdm(img_list, desc='Writing contours:'):
img_path = os.path.join(args.image_dir, img)
anno = parse_anno_file(args.cvat_xml, img)
background = []
is_first_image = True
for image in anno:
if is_first_image:
current_image = cv2.imread(img_path)
height, width, _ = current_image.shape
background = np.zeros((height, width, 3), np.uint8)
is_first_image = False
output_path = os.path.join(args.output_dir, img.split('.')[0] + '.png')
background = create_mask_file(width,
height,
mask_bitness,
background,
image['shapes'],
args.scale_factor)
cv2.imwrite(output_path, background)

When we execute file as command to the python interpreter, we must add the following construct:

当我们将文件作为命令执行到python解释器时,我们必须添加以下构造:

if __name__ == "__main__":
main()

That’s all. To run the script, you should run the following command (scale factor is 1 by default when after markup you don’t resize images):

就这样。 要运行脚本,您应该运行以下命令(在标记后不调整图像大小时,默认比例因子为1):

python script_name.py --image-dir original_images_dir --cvat-xml cvat.xml --output-dir masks_dir --scale-factor 0.4

An original image example:

原始图片示例:

Google Earth Google Earth共同创建的

The mask obtained as a result of the script:

通过脚本获得的掩码:

OpenCV library OpenCV库创建

Conclusion

结论

The considered approach allows obtaining more complex mask files from the data contained in the CVAT XML. You can extract individual polygons or highlight polygons with different colors depending on the number of vertices. In addition, after a little revision, the considered script will allow cutting polygonal sections from the original images in accordance with the marking contour.

考虑的方法允许从CVAT XML中包含的数据中获取更复杂的掩码文件。 您可以提取单个多边形或根据顶点数突出显示具有不同颜色的多边形。 此外,稍作修改后,考虑的脚本将允许根据标记轮廓从原始图像切出多边形部分。

Computer Vision Annotation Tool (CVAT)

计算机视觉注释工具(CVAT)

lxml — XML and HTML with Python

lxml —使用Python的XML和HTML

OpenCV

OpenCV

How to Run Your Python Scripts

如何运行您的Python脚本

翻译自: https://towardsdatascience.com/extract-annotations-from-cvat-xml-file-into-mask-files-in-python-bb69749c4dc9

你可能感兴趣的:(python,linux,java)