参考:目标检测实战篇1——数据集介绍(PASCAL VOC,MS COCO)
现在说的COCO数据集一般指的就是2017年的数据集:图像一共25GB,看看就好了,肯定不会训练的。
图像 | 标注 |
---|---|
2017 Train images [118K/18GB] | 2017 Train/Val annotations [241MB] |
2017 Val images [5K/1GB] | |
2017 Test images [41K/6GB] | 2017 Testing Image info [1MB] |
COCO是一个用于目标检测、分割和图像描述生成的大型数据集,有以下特点:
Pascal VOC
,YOLO
和COCO
数据集格式,1. 基本的文件结构:
data
├─ annotations
│ ├─ instances_train2017.json
│ └─ instances_val2017.json
├─ train2017
│ ├─ 000000000???.jpg
│ ├─ 000000000???.jpg
│ └─ ...
└─ val2017
├─ 000000000???.jpg
├─ 000000000???.jpg
└─ ...
参考: COCO with YOLO
2. 标注文件格式
标注文件以json格式存在,不是PASAL VOC那种一个xml对应一个文件标注,这里是所有图像的标注都存在于一个文件里,看下图左侧的行号:
具体点:
{
"images": [
{
"id": 0,
"file_name": "34020010494_e5cb88e1c4_k.jpg",
"height": 1536,
"width": 2048
}, //这是一个示例
],//"images"字段结束
"annotations": [
{
"image_id": 0,
"id": 0,
"category_id": 0,
"bbox": [
994,
619,
451,
547
],
"area": 246697,
"segmentation": [
[
1020.5,
963.5,
1000.5,
...
963.5
]
],
"iscrowd": 0
},//这是一个示例
],//"annotations"字段结束
"categories": [
{
"id": 0,
"name": "balloon"
}
]//"annotations"字段结束
}
其遵循的数据格式参见:COCO-Data format。
info
字段和licenses
字段,其他都是基本符合的{
"info": info,
"images": [image],
"annotations": [annotation],
"licenses": [license],
}
info{
"year": int,
"version": str,
"description": str,
"contributor": str,
"url": str,
"date_created": datetime,
}
image{
"id": int,
"width": int,
"height": int,
"file_name": str,
"license": int,
"flickr_url": str,
"coco_url": str,
"date_captured": datetime,
}
license{
"id": int,
"name": str,
"url": str,
}
annotations
字段一般不同,同时有些还会有categories
字段,目标检测任务这两个字段的规范是:annotation{
"id": int,
"image_id": int,
"category_id": int,
"segmentation": RLE or [polygon
],
"area": float,
"bbox": [x,y,width,height
],
"iscrowd": 0 or 1,
// iscrowd是1,表示要使用RLE
}
categories[
{
"id": int,
"name": str,
"supercategory": str,
// 比如 蓝猫的父类是猫咪,类别的层次
}
]
这和上面的例子也是差不多的,这就是COCO的标注格式了。
Evaluation指标: 参考 Metrics
截了个图,不做具体说明了
除了COCO API,还有MASK API和FiftyOne
来源:cocoapi/PythonAPI/pycocotools/
protobuf
这个库搞混了,名字有一丢丢像(我之前以为这个库很难,就是和这个搞混了)安装:
# windows下(直接pip需要编译,需要visual studio)
conda install pycocotools -c conda-forge
# macos (也需要编译,但是macOS系统包含编译工具,不需要进行额外依赖安装)
pip install pycocotools
主要就是以下几个函数(其中,“ann”=annotation, “cat”=category, “img”=image)
函数名称 | 功能 |
---|---|
getAnnIds |
返回满足筛选条件的标注id |
getCatIds |
返回满足筛选条件的类别id |
getImgIds |
返回满足筛选条件的图像id |
loadAnns |
加载指定id的标注 |
loadCats |
加载指定id的类别 |
loadImgs |
加载指定id的图像 |
loadRes |
eval评估模型效果时加载预测结果的 |
showAnns |
显示特定的标注 |
示例代码:上传到github了(jupyter),详见:openMMLabCampusLearn/selfExercise/1.pycocotools.ipynb
参考:
这部分对应的代码详见Github: openMMLabCampusLearn/selfExercise/2.图像EXIF信息.ipynb
可交换图像文件格式(英语:Exchangeable image file format,官方简称Exif),是专门为数码相机的照片设定的,可以记录数码照片的属性信息和拍摄数据。
关于EXIF中tags的介绍,可以看EXIF标准的pdf文件:https://www.cipa.jp/std/documents/e/DC-X008-Translation-2019-E.pdf
可以把一个jpeg图像后缀名改为txt格式,用文本文件打开,就可以看到读取JPG图片的Exif属性(一) - Exif信息简介里介绍的那些十六进制码流了
参考:
关于JEPG解码的格式,主要要去看这个协议:
ISO/IEC 10918-1(ISO/IEC 10918-1:1994
Information technology — Digital compression and coding of continuous-tone still images: Requirements and guidelines)
参考:
from PIL import Image
demo_path = "datasets/cat_dataset/images/IMG_20211020_091507.jpg"
demo_image = Image.open(demo_path)
for k, v in demo_image.getexif().items():
print("Tag", k, "Value", v)
> Tag 274 Value 6
这里的274表示的是图像的朝向(Orientation),
参考:
Tag (hex) | Tag (dec) | IFD | Key | Type | Tag description |
---|---|---|---|---|---|
0x000b | 11 | Image | Exif.Image.ProcessingSoftware | Ascii | The name and version of the software used to post-process the picture |
0x0100 | 256 | Image | Exif.Image.ImageWidth | Long | The number of columns of image data, equal to the number of pixels per row. In JPEG compressed data a JPEG marker is used instead of this tag. |
0x0101 | 257 | Image | Exif.Image.ImageLength | Long | The number of rows of image data. In JPEG compressed data a JPEG marker is used instead of this tag. |
0x0103 | 259 | Image | Exif.Image.Compression | Short | The compression scheme used for the image data. When a primary image is JPEG compressed, this designation is not necessary and is omitted. When thumbnails use JPEG compression, this tag value is set to 6. |
0x0106 | 262 | Image | Exif.Image.PhotometricInterpretation | Short | The pixel composition. In JPEG compressed data a JPEG marker is used instead of this tag. |
0x0107 | 263 | Image | Exif.Image.Thresholding | Short | For black and white TIFF files that represent shades of gray, the technique used to convert from gray to black and white pixels. |
0x0108 | 264 | Image | Exif.Image.CellWidth | Short | The width of the dithering or halftoning matrix used to create a dithered or halftoned bilevel file. |
0x010a | 266 | Image | Exif.Image.FillOrder | Short | The logical order of bits within a byte |
0x010f | 271 | Image | Exif.Image.Make | Ascii | The manufacturer of the recording equipment. This is the manufacturer of the DSC, scanner, video digitizer or other equipment that generated the image. When the field is left blank, it is treated as unknown. |
0x0110 | 272 | Image | Exif.Image.Model | Ascii | The model name or model number of the equipment. This is the model name or number of the DSC, scanner, video digitizer or other equipment that generated the image. When the field is left blank, it is treated as unknown. |
0x0112 | 274 | Image | Exif.Image.Orientation | Short | The image orientation viewed in terms of rows and columns |
0x011a | 282 | Image | Exif.Image.XResolution | Rational | The number of pixels per in the direction. When the image resolution is unknown, 72 [dpi] is designated. |
0x011b | 283 | Image | Exif.Image.YResolution | Rational | The number of pixels per in the direction. The same value as is designated. |
上面只贴了一部分,完整的可以参考:
另外,有些字段的取值是枚举的,每个值都有自己的意义,以Exif.Image.Orientation
为例,在参考网站中搜索Orientation
字段,可以得到:
1 = Horizontal (normal)
2 = Mirror horizontal
3 = Rotate 180
4 = Mirror vertical
5 = Mirror horizontal and rotate 270 CW(ClockWise rotation,顺时针方向)
6 = Rotate 90 CW
7 = Mirror horizontal and rotate 90 CW
8 = Rotate 270 CW
具体字段的值可以看看:
以下内容转载自:
日常用手机拍照的时候应该都遇到过这种情况:竖着拿手机拍横向的内容时,手机里的画面会自动变成横向,拍摄结束后,也会以横向的格式保存。
为什么手机不直接把图像旋转存成旋转后的图像,有个说法:
jpg旋转以后二次编码会损失画质。
所以相机默认不旋转原始数据,而是在语义层面添加旋转语义供后续使用。
from PIL import ExifTags
for k,v in ExifTags.TAGS.items():
print(f"Tag:{k}, Value:{v}")
>...
Tag:271, Value:Make
Tag:272, Value:Model
Tag:273, Value:StripOffsets
Tag:274, Value:Orientation
...
参考:
在Pillow>=6.0.0的版本里,已经有一个现成的工具帮助用户根据EXIF的Orientation信息旋转图像
from PIL import ImageOps
image = ImageOps.exif_transpose(image)
"""
ImageOps.exif_transpose:
如果一个图像的EXIF朝向标签的值不是1,根据其方向值对图像进行转换,然后删除方向标签
"""
# 完整代码
from PIL import ImageOps,Image
demo_path = "datasets/cat_dataset/images/IMG_20211020_091507.jpg"
demo_image = Image.open(demo_path)
image = ImageOps.exif_transpose(demo_image)
plt.figure(figsize=(8,6))
plt.subplot(1,2,1)
plt.title(f"Orientation:{demo_image.getexif()[274]}")
plt.imshow(demo_image)
plt.axis("off")
plt.subplot(1,2,2)
plt.title("Modified")
plt.imshow(image)
plt.axis("off")
参考:
老式的写法(MMDetection课上写的,其实是来自PIL提供的exif_transpose
函数的改写,点击这里)
下面的脚本,在旋转图像后,并没有删除方向标签,可能会给之后的重新标记或者其他处理带来隐患。。
def apply_exif_orientation(image):
_EXIF_ORIENT =274
if not hasattr(image,'getexif'):
return image
try:
exif = image.getexif()
except Exception:
exif=None
return image
orientation = exif.get(_EXIF_ORIENT)
"""
1 = Horizontal (normal)
2 = Mirror horizontal # 水平镜像
3 = Rotate 180 # 旋转180度
4 = Mirror vertical # 竖直镜像
5 = Mirror horizontal and rotate 270 CW(ClockWise rotation,顺时针方向)
6 = Rotate 90 CW # orientation=6 表示这张图在标注软件上显示是 顺时针旋转原图90度之后的样子
7 = Mirror horizontal and rotate 90 CW
8 = Rotate 270 CW
"""
method = {2:Image.FLIP_LEFT_RIGHT,
3:Image.ROTATE_180,
4:Image.FLIP_TOP_BOTTOM,
5:Image.TRANSPOSE,
6:Image.ROTATE_270, # PIL库旋转都是以逆时针方向角度来算的,所以这里和上面90度是互补的
7:Image.TRANSVERSE,
8:Image.ROTATE_90}.get(orientation)
if method is not None:
return image.transpose(method)
# 注意,旋转的时候是以左上角的顶点为旋转中心进行旋转的
return image
根据:Image.rotate(angle, resample=Resampling.NEAREST, expand=0, center=None, translate=None, fillcolor=None)可知:
参考:
默认OpenCV在读取图像时已经应用了EXIF信息
import cv2
demo_path = "datasets/cat_dataset/images/IMG_20211020_091507.jpg"
demo_image = cv2.imread(demo_path)
plt.imshow(demo_image[:,:,::-1])
详见:Opencv4.x-imread()-文档
或者看看这个讲解:20170227. EXIF信息的说明(opencv320 ApplyExifOrientation)
如果不想OpenCV默认使用EXIF信息,可以:
import cv2
demo_path = "datasets/cat_dataset/images/IMG_20211020_091507.jpg"
plt.figure(figsize=(15,5))
plt.subplot(1,3,1)
demo_image = cv2.imread(demo_path)
plt.title("With EXIF(Default)")
plt.imshow(demo_image[:,:,::-1])
plt.subplot(1,3,2)
# opencv-python要求Flag必须是整数,所以要找Flag对应的整数值
withOut_image_1 = cv2.imread(demo_path,-1)
plt.title("cv::IMREAD_UNCHANGED")
# cv::IMREAD_UNCHANGED 会保持原图的通道数
plt.imshow(withOut_image_1[:,:,::-1])
plt.subplot(1,3,3)
withOut_image_2 = cv2.imread(demo_path,128)
plt.title(" cv::IMREAD_IGNORE_ORIENTATION")
# cv::IMREAD_IGNORE_ORIENTATION输出单通道图像
plt.imshow(withOut_image_2)
参考: