第一次参加这类的算法比赛,记录一下自己遇到的一些点,做个总结。
比较浅显的一些记录,第一次的经验之谈,适合首次参加可能容易遇到的问题
使用的autodl平台
import torch as t
class config(object):
seed = 1024
dtype = t.float32
device = "cuda:0" if t.cuda.is_available() else "cpu"
data_dir = "/root/autodl-tmp/"
log_dir = "./logs"
checkpoints_dir = "./save_checkpoint"
model_name = ""
pretrain_model = "" # pretrain for fine-tune
# dataset config
num_step = 20 # 1 for 6-hours, 4 for 1-day, and 20 for 5-days
test_names = ["msl"] # ["t2m", "u10", "v10", "msl", "tp"]
ini_forecast_timestep = "12" # ["00", "12", "00 & 12", "all"]
# train config
train_batch_size = 16
num_workers = 16
train_max_epochs = 50
loss_log_iters = 100
img_log_iters = 500
model_save_fre = 5
# valid and test config
val_batch_size = 16
test_batch_size = 1
conf = config()
t.manual_seed(seed)
if t.cuda.is_available():
t.cuda.manual_seed(seed)
t.cuda.manual_seed_all(seed)
2. 使得cuDNN来衡量自己库里面的多个卷积算法的速度,然后选择其中最快的那个卷积算法,启动算法的前期会比较慢,但算法跑起来以后会非常快
t.backends.cudnn.benchmark = False
3. 令卷积算法确定,设置随机数seed一定,可保证每次运行网络相同输入得到输出相同
t.backends.cudnn.deterministic = True
4. Tensorboard使用
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter(log_dir=log_dir) # 设置tensorboard的找文件的文件夹
writer.add_scalar("Train/Losses/loss", loss.item(), iters)
def img_summary(img, iters, name_scope, writer):
batch_size = img.size()[0]
for i in range(batch_size):
writer.add_images(name_scope + "/Img" + str(i + 1), img[i], iters)
utils.img_summary(
output,#写入的二维函数
iters,
"Train/Imgs/{}/Prediction".format(conf.test_names[0].upper()),
writer,
)
writer.close()
5. 函数使用,导入模块,函数名字即可调用
getattr(unet, conf.model_name)(conf)
6. 加载数据dataloader
train_dataset = dataset.dataset_name(conf, train=True, test=False)
train_dataloader = DataLoader(
train_dataset,
conf.train_batch_size,
shuffle=True,#是否打乱,train打乱、valid不乱
num_workers=conf.num_workers,#几个线程一起跑,考验cpu
pin_memory=True,#cpu会报错,gpu下物理导入,更快
)
class dataset(Dataset):
def __init__(self, conf, train=True, test=False):
self.data_dir = conf.data_dir
self.dtype = conf.dtype
self.num_step = conf.num_step
self.input_names = xxx
self.test_names = xxx
self.train = train
self.test = test
self.train_folder = "train"
self.test_folder = "testA"
self.num_data = 0
self.ds = []
self.load_dataset()
def load_dataset(self):
if self.train:
······
else:
if not self.test:
······
else:
······
def __getitem__(self, idx):
if not self.test:
assert idx < self.num_data
······
else:
assert idx < self.num_data
······
def __len__(self):
return self.num_data
权重:每次训练的最好的权重进行保存,删除多余没用的
训练代码:对于不同模型的训练代码略有不同,可以创建多个train.py
数据代码:对于不同加载数据的代码,也可以写多个dataset类
模型代码:对于不同模型的代码,写多个model类
对于加载权重文件,不应该写入dataset里面,因为pin=True,会让模型用cpu加载,更快,但是权重文件一般要cuda,所以放在model__init__里面
打包:
zip -r xx.zip 文件夹/
首先将数据打包,小数据可以右键下载,但是不太稳定,可以在autopel里面保存到网盘,稳定下载,也适合不同实例之间的传输