Task2 数据读取与数据扩增

开始今天的学习-走起:)

1. 导入用到的所有包

import os, sys, glob, shutil, json
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
import cv2

from PIL import Image
import numpy as np

from tqdm import tqdm, tqdm_notebook

import torch
torch.manual_seed(0)
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.benchmark = True

import torchvision.models as models
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data.dataset import Dataset

如果这里出现错误,参考下面修改方法:

Q: No module named 'cv2'
A: pip install jupyter tqdm opencv-python matplotlib pandas

Q: libSM.so.6: cannot open shared object file: No such file or directory
A: apt update && apt install -y libsm6 libxext6

Q: ibXrender.so.1: cannot open shared object file: No such file or directory
A: apt-get install libxrender1

2. 针对标题设定,先看数据读取

  • 图像处理方面:Pillow(易/简单)和OpenCV(难/复杂)

2.1 Pillow走起

2.1.1 先读个小猫(人见人爱)的图像,代码如下:

# 读取图片
im =Image.open('./cat.jpg')
cat.jpg

2.1.2 进一步,想用个应用模糊滤镜(蓝色的)

from PIL import Image, ImageFilter
im = Image.open('./cat.png')
# 应用模糊滤镜
im2 = im.filter(ImageFilter.BLUR)
im2.save('blur.jpg', 'jpeg')
image.png

2.1.3 更有常有的缩小

#注意定义下w,h
w = 150
h = 200
# 打开一个jpg图像文件,注意是当前路径
im = Image.open('./cat.jpg')
im.thumbnail((w//2, h//2))
im.save('thumbnail.jpg', 'jpeg')
image.png

小结(pillow)

上面只是小试牛刀,想用更好的请看官方网站:
https://pillow.readthedocs.io/en/stable/

2.2 OpenCV

  • 由Intel开源得来
  • 跨平台的计算机视觉库
  • 比Pillow更加强大
  • 学习成本也高

2.2.1 以同样的小猫为例(变蓝了!):

img = cv2.imread('./cat.jpg')
# Opencv默认颜色通道顺序是BRG,转换一下
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
cv2.imwrite('cv2.jpg', img)
image.png

2.2.2 把小猫变灰

img = cv2.imread('./cat.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite('cv2.jpg', img)
image.png

2.2.3 Canny边缘检测(这图就是简笔画)

edges = cv2.Canny(img, 30, 70)
cv2.imwrite('canny.jpg', edges)
image.png

2.2.4 二值化

import matplotlib.pyplot as plt
img = cv2.imread('cat.jpg',0) #直接读为灰度图像
ret,thresh1 = cv2.threshold(img,127,255,cv2.THRESH_BINARY)
ret,thresh2 = cv2.threshold(img,127,255,cv2.THRESH_BINARY_INV)
ret,thresh3 = cv2.threshold(img,127,255,cv2.THRESH_TRUNC)
ret,thresh4 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO)
ret,thresh5 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO_INV)
titles = ['img','BINARY','BINARY_INV','TRUNC','TOZERO','TOZERO_INV']
images = [img,thresh1,thresh2,thresh3,thresh4,thresh5]
for i in range(6):
    plt.subplot(2,3,i+1),plt.imshow(images[i],'gray')
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])
plt.show()
image.png

小结

OpenCV包含了众多的图像处理的功能,OpenCV包含了你能想得到的只要与图像相关的操作。此外OpenCV还内置了很多的图像特征处理算法,如关键点检测、边缘检测和直线检测等。
OpenCV官网:https://opencv.org/
OpenCV Github:https://github.com/opencv/opencv
OpenCV 扩展算法库:https://github.com/opencv/opencv_contrib

2.3 数据扩增方法

学完上面Pillow和OpenCV的使用,转回赛题街道字符识别任务中。
需要两步:数据读取,数据扩增(Data Augmentation)操作

2.3.1 数据扩增介绍

  1. 好处
  • 增加训练集的样本
  • 有效缓解模型过拟合
  • 给模型更强的泛化能力
  1. 数据扩增方法
  • 颜色空间
  • 尺寸空间
  • 样本空间

对于图像分类,数据扩增一般不会改变标签;对于物体检测,数据扩增会改变物体坐标位置;对于图像分割,数据扩增会改变像素标签。

2.3.2 常见的数据扩增方法

从图像颜色、尺寸、形态、空间和像素等角度进行变换。
以torchvision为例,常见的数据扩增方法包括(小猫):

from torchvision import transforms
from PIL import Image
from torchvision.transforms import functional as TF
import torch
path = "cat.jpg"
img = Image.open(path)
  • transforms.CenterCrop 对图片中心进行裁剪
size = (300, 500)
transform = transforms.Compose([
    transforms.CenterCrop(size),
])
new_img = transform(img)
new_img
image.png
  • transforms.ColorJitter 对图像颜色的对比度、饱和度和零度进行变换
transform = transforms.Compose([
    transforms.ColorJitter(brightness=(0, 16), contrast=(
        0, 10), saturation=(0, 25), hue=(-0.5, 0.5))
])
new_img = transform(img)
new_img
image.png
  • transforms.FiveCrop 对图像四个角和中心进行裁剪得到五分图像
UNIT_SIZE = 200 # 每张图片的宽度是固定的
size = (100, UNIT_SIZE)
transform = transforms.Compose([
    transforms.FiveCrop(size)
])

new_img = transform(img)
delta = 20  # 偏移量,几个图片间隔看起来比较明显
new_img_2 = Image.new("RGB", (UNIT_SIZE*5+delta, 100))
top_right = 0
for im in new_img:
    new_img_2.paste(im, (top_right, 0)) # 将image复制到target的指定位置中
    top_right += UNIT_SIZE + int(delta/5) # 左上角的坐标,因为是横向的图片,所以只需要 x 轴的值变化就行

new_img_2
image.png
  • transforms.Grayscale 对图像进行灰度变换
my_trans = transforms.Grayscale(num_output_channels=1)
new_img = my_trans(img)
new_img
image.png
  • transforms.Pad 使用固定值进行像素填充
from torchvision import transforms
from PIL import Image
padding_img = transforms.Pad(padding=50, fill=10)
img = Image.open('cat.jpg')

 
print(type(img))
print(img.size)
 
padded_img=padding_img(img)
print(type(padded_img))
print(padded_img.size)

plt.imshow(padded_img)


(500, 375)

(600, 475)


image.png
  • transforms.RandomAffine 随机仿射变换
my_trans = transforms.RandomAffine(degrees=30, translate=None, scale=None,
shear=None, resample=False, fillcolor=0)
new_img = my_trans(img)
new_img
image.png
  • transforms.RandomCrop 随机区域裁剪
my_trans = transforms.RandomCrop(size, padding=None,
pad_if_needed=False, fill=0, padding_mode='constant')
new_img = my_trans(img)
new_img
image.png
  • transforms.RandomHorizontalFlip 随机水平翻转
my_trans = transforms.RandomHorizontalFlip(p=0.8)
new_img = my_trans(img)
new_img
image.png
  • transforms.RandomRotation 随机旋转
my_trans = transforms.RandomRotation(degrees=90, resample=False,expand=False, center=None)
new_img = my_trans(img)
new_img
image.png
  • transforms.RandomVerticalFlip 随机垂直翻转
my_trans = transforms.RandomVerticalFlip(p=0.5)
new_img = my_trans(img)
new_img
image.png

2.3.3 常用的数据扩增库

  • torchvision
    https://github.com/pytorch/vision

pytorch官方提供的数据扩增库,提供了基本的数据数据扩增方法,可以无缝与torch进行集成;但数据扩增方法种类较少,且速度中等;

  • imgaug
    https://github.com/aleju/imgaug

imgaug是常用的第三方数据扩增库,提供了多样的数据扩增方法,且组合起来非常方便,速度较快;

  • albumentations
    https://albumentations.readthedocs.io/

是常用的第三方数据扩增库,提供了多样的数据扩增方法,对图像分类、语义分割、物体检测和关键点检测都支持,速度较快。

2.4 Pytorch读取数据

  • Pytorch读取赛题数据
  • 通过Dataset进行封装
  • 通过DataLoder进行并行读取
import os, sys, glob, shutil, json
import cv2

from PIL import Image
import numpy as np

import torch
from torch.utils.data.dataset import Dataset
import torchvision.transforms as transforms

class SVHNDataset(Dataset):
    def __init__(self, img_path, img_label, transform=None):
        self.img_path = img_path
        self.img_label = img_label 
        if transform is not None:
            self.transform = transform
        else:
            self.transform = None

    def __getitem__(self, index):
        img = Image.open(self.img_path[index]).convert('RGB')

        if self.transform is not None:
            img = self.transform(img)
        
        # 原始SVHN中类别10为数字0
        lbl = np.array(self.img_label[index], dtype=np.int)
        lbl = list(lbl)  + (5 - len(lbl)) * [10]
        
        return img, torch.from_numpy(np.array(lbl[:5]))

    def __len__(self):
        return len(self.img_path)

train_path = glob.glob('../input/train/*.png')
train_path.sort()
train_json = json.load(open('../input/train.json'))
train_label = [train_json[x]['label'] for x in train_json]

data = SVHNDataset(train_path, train_label,
          transforms.Compose([
              # 缩放到固定尺寸
              transforms.Resize((64, 128)),

              # 随机颜色变换
              transforms.ColorJitter(0.2, 0.2, 0.2),

              # 加入随机旋转
              transforms.RandomRotation(5),

              # 将图片转换为pytorch 的tesntor
              # transforms.ToTensor(),

              # 对图像像素进行归一化
              # transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])
            ]))

通过上述代码,可以将赛题的图像数据和对应标签进行读取,在读取过程中的进行数据扩增,效果如下所示:

1 2 3
image.png
image.png
image.png

接下来我们将在定义好的Dataset基础上构建DataLoder

  • Dataset:对数据集的封装,提供索引方式的对数据样本进行读取
  • DataLoder:对Dataset进行封装,提供批量读取的迭代读取

加入DataLoder后,数据读取代码改为如下:

import os, sys, glob, shutil, json
import cv2

from PIL import Image
import numpy as np

import torch
from torch.utils.data.dataset import Dataset
import torchvision.transforms as transforms

class SVHNDataset(Dataset):
    def __init__(self, img_path, img_label, transform=None):
        self.img_path = img_path
        self.img_label = img_label 
        if transform is not None:
            self.transform = transform
        else:
            self.transform = None

    def __getitem__(self, index):
        img = Image.open(self.img_path[index]).convert('RGB')

        if self.transform is not None:
            img = self.transform(img)
        
        # 原始SVHN中类别10为数字0
        lbl = np.array(self.img_label[index], dtype=np.int)
        lbl = list(lbl)  + (5 - len(lbl)) * [10]
        
        return img, torch.from_numpy(np.array(lbl[:5]))

    def __len__(self):
        return len(self.img_path)

train_path = glob.glob('../input/train/*.png')
train_path.sort()
train_json = json.load(open('../input/train.json'))
train_label = [train_json[x]['label'] for x in train_json]

train_loader = torch.utils.data.DataLoader(
        SVHNDataset(train_path, train_label,
                   transforms.Compose([
                       transforms.Resize((64, 128)),
                       transforms.ColorJitter(0.3, 0.3, 0.2),
                       transforms.RandomRotation(5),
                       transforms.ToTensor(),
                       transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
            ])), 
    batch_size=10, # 每批样本个数
    shuffle=False, # 是否打乱顺序
    num_workers=10, # 读取的线程个数
)

for data in train_loader:
    break

在加入DataLoder后,数据按照批次获取,每批次调用Dataset读取单个样本进行拼接。此时data的格式为:
torch.Size([10, 3, 64, 128]), torch.Size([10, 6])
前者为图像文件,为batchsize * chanel * height * width次序;后者为字符标签。

2.5 小节

先讲解数据读取,再讲解数据扩增及使用;最后再上Pytorch框架对数据读取的代码。
这一步步掰开了揉碎了,庖丁解牛式的学习真是过瘾,受用了多谢。

你可能感兴趣的:(Task2 数据读取与数据扩增)