对于数据集大家还是要先知道他的构成,以免入坑,这个数据集100类分别划给了train,test,val,不利于我的图像分类任务,所以重新做了划分。100类,每类600张,我将每类400,100,100分别重新划给了train,test,val:
train.csv包含40000张图片,共100个类别。
val.csv包含10000张图片,共100个类别。
test.csv包含10000张图片,共100个类别。
可以看这个:
https://blog.csdn.net/qq_37541097/article/details/113027489
百度网盘下载:
链接: https://pan.baidu.com/s/1Uro6RuEbRGGCQ8iXvF2SAQ 密码: hl31
代码如下(示例):
import pandas as pd
代码如下(示例):
# 原文件
train_path = r'D:\CsDataset\mini-imagenet\train.csv'
test_path = r'D:\CsDataset\mini-imagenet\test.csv'
val_path = r'D:\CsDataset\mini-imagenet\val.csv'
# 处理后要保存的文件
new_train_path = r'D:\CsDataset\mini-imagenet\new_train.csv'
new_test_path = r'D:\CsDataset\mini-imagenet\new_test.csv'
new_val_path = r'D:\CsDataset\mini-imagenet\new_val.csv'
def split_miniimagenet(train_path, test_path, val_path, new_train_path, new_test_path, new_val_path):
train_data = pd.read_csv(train_path)
test_data = pd.read_csv(test_path)
val_data = pd.read_csv(val_path)
data11 = [train_data.iloc[600*i:600*i+400,:] for i in range(64)]
df11 = pd.concat(data11)#竖着拼接
data12 = [test_data.iloc[600*i:600*i+400,:] for i in range(20)]
df12 = pd.concat(data12)
data13 = [val_data.iloc[600*i:600*i+400,:] for i in range(16)]
df13 = pd.concat(data13)
data1 = [df11, df12, df13]
df1 = pd.concat(data1)
df1.to_csv(new_train_path,index=0, sep=',')#输出文件名
data21 = [train_data.iloc[600*i+400:600*i+500,:] for i in range(64)]
df21 = pd.concat(data21)
data22 = [test_data.iloc[600*i+400:600*i+500,:] for i in range(20)]
df22 = pd.concat(data22)
data23 = [val_data.iloc[600*i+400:600*i+500,:] for i in range(16)]
df23 = pd.concat(data23)
data2 = [df21, df22, df23]
df2 = pd.concat(data2)
df2.to_csv(new_test_path,index=0, sep=',')#输出文件名
data31 = [train_data.iloc[600*i+500:600*i+600,:] for i in range(64)]
df31 = pd.concat(data31)
data32 = [test_data.iloc[600*i+500:600*i+600,:] for i in range(20)]
df32 = pd.concat(data32)
data33 = [val_data.iloc[600*i+500:600*i+600,:] for i in range(16)]
df33 = pd.concat(data33)
data3 = [df31, df32, df33]
df3 = pd.concat(data3)
df3.to_csv(new_val_path,index=0, sep=',')#输出文件名
split_miniimagenet(train_path, test_path, val_path, new_train_path, new_test_path, new_val_path)