fgvc-aircraft-2013b飞机细粒度数据训练集和测试集划分python代码

fgvc-aircraft-2013b是细粒度图像分类和识别研究中经典的benchmarks,它包含四种类型的标注:

(1)按照manufacturer进行划分,可分为30个类别,例如ATR、Airbus、Antonov、Beechcraft、Boeing。

(2)按照families进行划分,可分为70个类别。

(3)按照variants进行划分,可分为100个类别(一般细粒度图像分类中经常采用的划分标注)

(4)数据集的bounding_box

下面是python实现的fgvc-aircraft-2013b中100类别的训练集和测试后划分代码(供参考)。

我的文件夹目录如下:

fgvc-aircraft-2013b飞机细粒度数据训练集和测试集划分python代码_第1张图片

其中,images文件夹为存放的10000张飞机图片;

           dataset文件夹中包含train、test、trainval、val四个文件夹,分别用来存在划分后的图片。

          文件夹30,70,100和bounding_box为上述的4种数据标注文件,分别保存有.txt文件。

# *_*coding: utf-8 *_*
# author --liming--

"""
给定train,test,val的txt文件,分别表示图像以文件夹的形式
"""

import os
import shutil
from PIL import Image
import argparse

path = '/media/lm/1E7FBDC6EEE168BC/fine_grained_dataset/FGVC_Aircraft/fgvc-aircraft-2013b'

image_path = path + '/images/'
save_train_path = path + '/dataset/train/'
save_test_path = path + '/dataset/test/'
save_trainval_path = path + '/dataset/trainval/'
save_val_path = path + '/dataset/val/'
# 读取图像文件夹,获取文件名列表
imgs = os.listdir(image_path)
num = len(imgs)

# 读取txt文件
f_test = open(path + '/100/images_variant_test.txt','r')
f_train = open(path + '/100/images_variant_train.txt','r')
f_trainval = open(path + '/100/images_variant_trainval.txt','r')
f_val = open(path + '/100/images_variant_val.txt','r')
test_list = list(f_test)
train_list = list(f_train)
trainval_list = list(f_trainval)
val_list = list(f_val)

parser = argparse.ArgumentParser(description='Data Split based on Txt')
parser.add_argument('--dataset',
                    default='test',
                    help='Select which dataset split, test, train, trainval, or val')
args = parser.parse_args()

# 判断输入图像属于哪一类
print('==> data processing...')
if args.dataset == 'test':
    count = 0
    for i in range(num):
        aaaaa = len(test_list)
        bbbbbb = imgs[i][:7]
        for j in range(len(test_list)):
            if imgs[i][:7] == test_list[j][:7]:
                # 获取类别标签
                label = test_list[j][8:]
                label = label[:-1]

                if os.path.isdir(save_test_path + label):
                    shutil.copy(image_path + imgs[i], save_test_path + label + '/' + imgs[i])
                else:
                    os.makedirs(save_test_path + label)
                    shutil.copy(image_path + imgs[i], save_test_path+label+'/'+imgs[i])
                count += 1
                print('第%s张图片属于test类别' % count)
    print('Finished!!')

elif args.dataset == 'train':
    for i in range(num):
        for j in range(len(train_list)):
            if imgs[i][:7] == train_list[j][:7]:
                print('该图像属于train类别')
                # 获取类别标签
                label = train_list[j][8:]
                label = label[:-1]

                if os.path.isdir(save_train_path + label):
                    shutil.copy(image_path + imgs[i], save_train_path + label + '/' + imgs[i])
                else:
                    os.makedirs(save_train_path + label)
                    shutil.copy(image_path + imgs[i], save_train_path+label+'/'+imgs[i])
    print('Finished!!')

elif args.dataset == 'trainval':
    for i in range(num):
        for j in range(len(trainval_list)):
            if imgs[i][:7] == trainval_list[j][:7]:
                print('该图像属于trainval类别')
                # 获取类别标签
                label = trainval_list[j][8:]
                label = label[:-1]

                if os.path.isdir(save_trainval_path + label):
                    shutil.copy(image_path + imgs[i], save_trainval_path + label + '/' + imgs[i])
                else:
                    os.makedirs(save_trainval_path + label)
                    shutil.copy(image_path + imgs[i], save_trainval_path+label+'/'+imgs[i])
    print('Finished!!')

else:
    for i in range(num):
        for j in range(len(val_list)):
            if imgs[i][:7] == val_list[j][:7]:
                print('该图像属于val类别')
                # 获取类别标签
                label = val_list[j][8:]
                label = label[:-1]

                if os.path.isdir(save_val_path + label):
                    shutil.copy(image_path + imgs[i], save_val_path + label + '/' + imgs[i])
                else:
                    os.makedirs(save_val_path + label)
                    shutil.copy(image_path + imgs[i], save_val_path + label + '/' + imgs[i])
    print('Finished!!')

划分完毕后,由于飞机文件夹名称问题,F-16A/B会保存为F-16A/B;F/A-18会保存为F/A-18,需要将其截取出来,统一文件名即可。

最后的训练集和测试集划分如下:

(1)测试集(100个类别,共3333张图片)

fgvc-aircraft-2013b飞机细粒度数据训练集和测试集划分python代码_第2张图片

(2)训练集(100个类别,6667张图片)

fgvc-aircraft-2013b飞机细粒度数据训练集和测试集划分python代码_第3张图片

 

你可能感兴趣的:(计算机视觉)