使用CNN预测HEVC的CU分割 (2) -- 数据集的划分，随机抽取帧，优化数据集结构加快读取

我自己生成的数据集在GitHub，包含了训练、测试、验证集：

GitHub - wolverinn/HEVC-CU-depths-dataset: A dataset that contains the Coding Unit image files and their corresponding depths for HEVC intra-prediction.

关于数据集的准备部分，还有一些更加细节的工作需要完成。首先就是训练集，验证集和测试集的划分，在数据集中使用的所有YUV序列如下：

type	Train	Validation	Test
4K	Bund-Nightscape_3840x2160_30	Campfire-Party_3840x2160_30	Construction-Field_3840x2160_30
	Fountains_3840x2160_30	Library_3840x2160_30	Marathon_3840x2160_30
	Residential-Building_3840x2160_30	Runners_3840x2160_30	Rush-Hour_3840x2160_30
	Scarf_3840x2160_30
	Tall-Buildings_3840x2160_30
	Traffic-and-Building_3840x2160_30
	Traffic-Flow_3840x2160_30
	Tree-Shade_3840x2160_30
	Wood_3840x2160_30
2K	NebutaFestival_2560x1600_60	PeopleOnStreet_2560x1600_30	Traffic_2560x1600_30
	SteamLocomotiveTrain_2560x1600_60
1080p	BasketballDrive_1920x1080_50	BQTerrace_1920x1080_60	Cactus_1920x1080_50
	Kimono1_1920x1080_24
	Tennis_1920x1080_24
	ParkScene_1920x1080_24
720p	FourPeople_1280x720_60	SlideShow_1280x720_20	KristenAndSara_1280x720_60
	SlideEditing_1280x720_30
	Johnny_1280x720_60
480p	BasketballDrill_832x480_50	Flowervase_832x480_30	BQMall_832x480_60
	Keiba_832x480_30	Mobisode2_832x480_30	PartyScene_832x480_50
	RaceHorses_832x480_30
288	waterfall_352x288_20	akiyo_352x288_20	bridge-close_352x288_20
	bridge-far_352x288_20	coastguard_352x288_20	container_352x288_20
	flower_352x288_20	~~foreman_352x288_20~~	~~hall_352x288_20~~
	highway_352x288_20	~~mobile_352x288_20~~	~~mother-daughter_352x288_20~~
	news_352x288_20
	paris_352x288_20
	~~silent_352x288_20~~
	~~tempete_352x288_20~~
240	BasketballPass_416x240_50	BlowingBubbles_416x240_50	BQSquare_416x240_60

由于存储空间的限制以及相邻帧之间的高度相似，所以在训练的时候并不会使用从一个YUV文件中抽取出的所有帧，而是会随机抽取一些帧，参考下面的公式来选取帧：

每个YUV的第2帧和第27帧都被抽取，之后的第n个帧就按照上面的公式抽取，f代表YUV文件的总帧数。代码：

def crop_image_to_ctu(video_number):
    frames = len(os.listdir("{}\\temp-frames".format(WORKSPACE_PATH))) # 当前视频一共有多少帧
    random_frames = [2,27]
    n = int((25+frames/40)//4)
    for i in range(frames//n):
        f_index = 27 + n*(i+1)
        if f_index > frames:
            break
        else:
            random_frames.append(f_index) # 随机抽取帧，有一个公式得出抽取的帧的编号
    for image_file in os.listdir("{}\\temp-frames".format(WORKSPACE_PATH)):
        frame_number = int(image_file.split('_')[2])-1 # ffmpeg生成帧编号是从1开始，这里减1将编号变成从0开始和ctu分割信息对应
        if frame_number in random_frames:
            img = Image.open(os.path.join("{}\\temp-frames".format(WORKSPACE_PATH),image_file))
            img_width, img_height = img.size
            ctu_number_per_img = math.ceil(img_width / 64) * math.ceil(img_height / 64)
            for ctu_number in range(ctu_number_per_img):
                img_row = ctu_number // math.ceil(img_width / 64)
                img_colonm = ctu_number % math.ceil(img_height / 64)
                start_pixel_x = img_colonm * 64
                start_pixel_y = img_row * 64
                cropped_img = img.crop((start_pixel_x, start_pixel_y, start_pixel_x + 64, start_pixel_y + 64)) # 依次对抽取到的帧进行裁剪
                cropped_img.save("{}\\v_{}_{}_{}_.jpg".format(IMG_PATH,video_number,str(frame_number),str(ctu_number)))
            img.close()
            dump_ctu_file(video_number, str(frame_number)) # 将当前帧的所有ctu分割信息保存到新的文件，只保存抽取的帧的信息
        os.remove(os.path.join("{}\\temp-frames".format(WORKSPACE_PATH),image_file)) # 裁剪过后的帧就删掉
    print("Total frames extracted from video_{} : {}".format(video_number,len(random_frames)))

在生成数据集的过程中，我发现对于CTU的分割信息，如果将YUV的所有帧的分割信息保存到txt文件，这样一个YUV就会产生几十兆甚至1080P的YUV会产生一百多兆的文本文件，这样会导致训练的时候每加载一个CTU分割信息都会花费一定的时间，影响训练的效率。所以首先想到的是不用保存所有的分割信息，只需要保存之前抽取出的帧的CTU划分信息就可以了，这样可以大大减少存储空间。其次，还可以对保存的数据结构进行优化，如果保存在文本文件中，每次的读取方式就只能从第一行开始遍历，直到找到对应的帧的CTU，效率也很低。所以我将需要保存的CTU划分信息保存到了Python的字典(dict)中，结构如下：

video_0 = {
    "frame_2":{
        "ctu_0":[...] # 16x16的划分信息
        "ctu_1":[...]
        .
        .
        .
        "ctu_103":[...]
    }
    "frame_27":{
        ...
    }
}

这样可以快速根据图片是第几帧，第几个CTU找到对应的划分信息。将这个字典使用Python自带的持久化库pickle保存到v_0.pkl，在使用的时候可以方便地进行读取。代码如下：

def dump_ctu_file(video_number,frame_number):
    # 将抽取到的帧的所有ctu分割信息保存到pickle：{"frame_number_1":{"ctu_number_1":[...];"ctu_number_2":[...]};"frame_number_2":...}
    frame_detected = 0
    ctu_number = "0"
    temp_ctu = []
    f_pkl = open("v_{}.pkl".format(video_number), 'rb')
    video_dict = pickle.load(f_pkl)
    f_pkl.close()
    video_dict[frame_number] = {}
    with open(CtuInfo_FILENAME,'r') as f:
        for i,line in enumerate(f):
            if frame_detected == 0:
                if "frame" in line:
                    current_frame = line.split(':')[1]
                    if int(frame_number) == int(current_frame):
                        frame_detected = 1
            elif "frame" in line:
                break
            elif "ctu" in line:
                temp_ctu = []
                ctu_number = int(line.split(':')[1])
                video_dict[frame_number][str(ctu_number)] = []
            else:
                line_depths = line.split(' ')
                for index in range(16):
                    temp_ctu.append(int(line_depths[index]))
                    video_dict[frame_number][str(ctu_number)] = temp_ctu
    f_pkl = open("v_{}.pkl".format(video_number), 'wb')
    pickle.dump(video_dict, f_pkl)
    f_pkl.close()

数据预处理是神经网络训练中的一个很重要也很花费时间的环节，必须考虑到神经网络所需要的格式化的输入，如何将图片与标签对应，以及读取所花费的时间的优化。根据这些去对原始数据使用一定的规则进行处理。只有数据预处理部分做好了，才能在训练部分专注于神经网络而不是一些琐碎的数据处理规则。

我自己生成的数据集GitHub地址：

HEVC-CU-depths-dataset

使用CNN预测HEVC的CU分割 (2) -- 数据集的划分，随机抽取帧，优化数据集结构加快读取

你可能感兴趣的:(使用CNN预测HEVC的CU分割 (2) -- 数据集的划分，随机抽取帧，优化数据集结构加快读取)