1.随机种子点
通过设置随机种子点保证实验结果的可复现性
cudnn.deterministic
torch.backends.cudnn.deterministic =True 的话,每次返回的卷积算法将是确定的,即默认算法
cudnn.benchmark
设置 torch.backends.cudnn.benchmark=True 将会让程序在开始时花费一点额外时间,为整个网络的每个卷积层搜索最适合它的卷积实现算法,进而实现网络的加速。适用场景是网络结构固定(不是动态变化的),网络的输入形状(包括 batch size,图片大小,输入的通道)是不变的,其实也就是一般情况下都比较适用。反之,如果卷积层的设置一直变化,将会导致程序不停地做优化,反而会耗费更多的时间。
此程序中使用以下设置(faster, less reproducible):
cudnn.deterministic = False
cudnn.benchmark = True
def init_seeds(seed=0):
random.seed(seed)
np.random.seed(seed)
torch_utils.init_seeds(seed=seed)
def init_seeds(seed=0): # torch_utils.init_seeds
# torch.rand 固定
torch.manual_seed(seed)
# Speed-reproducibility tradeoff https://pytorch.org/docs/stable/notes/randomness.html
if seed == 0: # slower, more reproducible
cudnn.deterministic = True
cudnn.benchmark = False
else: # faster, less reproducible
cudnn.deterministic = False
cudnn.benchmark = True
2.nominal batch size
nominal batch size = 64为名义批次,比如实际批次为16,那么64/16=4,每4次迭代,才进行一次反向传播更新权重,可以节约显存
# define nominal batch size, set weight_decay
nominal_batch_size = 64
accumulate = max(round(nominal_batch_size / total_batch_size), 1)
hyp['weight_decay'] *= total_batch_size * accumulate / nominal_batch_size
# optimizer
if batch_index_now % accumulate == 0:
optimizer.step()
optimizer.zero_grad()
if ema is not None:
ema.update(model)
3.优化器
SGD、Momentum、RMSProp、Adam
SGD 是最普通的优化器, 也可以说没有加速效果, 而 Momentum 是 SGD 的改良版, 它加入了动量原则. 后面的 RMSprop 又是 Momentum 的升级版. 而 Adam 又是 RMSprop 的升级版. 不过从这个结果中我们看到, Adam 的效果似乎比 RMSprop 要差一点. 所以说并不是越先进的优化器, 结果越佳. 我们在自己的试验中可以尝试不同的优化器, 找到那个最适合你数据/网络的优化器。https://blog.csdn.net/qq_34690929/article/details/79932416
Momentum
可参见这里 https://blog.csdn.net/weixin_43378396/article/details/90741645
Momentum 传统的参数 W 的更新是把原始的 W 累加上一个负的学习率(learning rate) 乘以校正值 (dx). 此方法比较曲折。
我们把这个人从平地上放到了一个斜坡上, 只要他往下坡的方向走一点点, 由于向下的惯性, 他不自觉地就一直往下走, 走的弯路也变少了. 这就是 Momentum 参数更新。
冲量”这个概念源自于物理中的力学,表示力对时间的积累效应。
在普通的梯度下降法x+=v
中,每次x的更新量v为v=−dx∗lr,其中dx为目标函数func(x)对x的一阶导数,。
当使用冲量时,则把每次x的更新量v考虑为本次的梯度下降量−dx∗lr与上次x的更新量v乘上一个介于[0,1]的因子momentum的和,即
v=−dx∗lr+v∗momemtum
当本次梯度下降- dx * lr的方向与上次更新量v的方向相同时,上次的更新量能够对本次的搜索起到一个正向加速的作用。
当本次梯度下降- dx * lr的方向与上次更新量v的方向相反时,上次的更新量能够对本次的搜索起到一个减速的作用。
Nesterov Accelerated Gradient(Momentum改进版本,更优)
https://zhuanlan.zhihu.com/p/22810533
learning rate
学习率较小时,收敛到极值的速度较慢。
学习率较大时,容易在搜索过程中发生震荡。
learning rate decay
在使用梯度下降法求解目标函数func(x) = x * x的极小值时,更新公式为x += v,其中每次x的更新量v为v = - dx * lr,dx为目标函数func(x)对x的一阶导数。可以想到,如果能够让lr随着迭代周期不断衰减变小,那么搜索时迈的步长就能不断减少以减缓震荡。学习率衰减因子由此诞生:
lri=lrstart∗1.0/(1.0+decay∗i)
decay越小,学习率衰减地越慢,当decay = 0时,学习率保持不变。
decay越大,学习率衰减地越快,当decay = 1时,学习率衰减最快。
以上部分转载自 这里
权重衰减(weight decay)
L2正则化 参考: https://blog.csdn.net/program_developer/article/details/80867468
为何BatchNorm出现后基本不用L2 Regularization:
BatchNorm出现之后,基本很少场景中看到L2 Regularization的使用了,通俗的理解是BatchNorm的出现就是为了解决层与层之间数据分布差异大的问题,使用BatchNorm理论上是允许每层的数据差异性可在一定的范围内,而L2 Regularization是让数据变小防止输入差异大导致层层传递误差累积过大,所以BatchNorm更像是从本质上解决了问题,而L2 Regularization更像是一个技巧而已
https://baijiahao.baidu.com/s?id=1653085297096293714&wfr=spider&for=pc
为什么optimizer.step()可以更新Model()中的参数
optimizer.step()直观的是更新了optimizer.param_group()中的参数,而optimizer.param_group()中的参数来源于optimizer.add_param_group(),此方法将从model.named_parameters()中读取到的参数按照不同用途加入到optimizer.param_group()中,但是为什么optimizer.step()更新参数后model中的参数就随之改变了呢,其实是因为传递的全部是对张量数据的引用,所以不管保存成字典或是其他,都是对同一份张量数据进行操作。
Adam:
https://www.cnblogs.com/yifdu25/p/8183587.html
pg0, pg1, pg2 = [], [], [] # optimizer parameter groups
for k, v in model.named_parameters():
if v.requires_grad:
if '.bias' in k:
pg2.append(v) # biases
# 一般来说,权重衰减会用到网络中所有需要学习的参数上面。
# 然而仅仅将权重衰减用到卷积层和全连接层,不对biases,BN层的 \gamma, \beta 做权重衰减,效果会更好。
elif '.weight' in k and '.bn' not in k:
pg1.append(v) # apply weight decay
else:
pg0.append(v) # all else
if hyp['optimizer'] == 'adam': # https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html#OneCycleLR
optimizer = optim.Adam(pg0, lr=hyp['lr0'], betas=(hyp['momentum'], 0.999)) # adjust beta1 to momentum
else:
optimizer = optim.SGD(pg0, lr=hyp['lr0'], momentum=hyp['momentum'], nesterov=True)
optimizer.add_param_group({'params': pg1, 'weight_decay': hyp['weight_decay']}) # add pg1 with weight_decay
optimizer.add_param_group({'params': pg2}) # add pg2 (biases)
print('Optimizer groups: %g .bias, %g conv.weight, %g other' % (len(pg2), len(pg1), len(pg0)))
del pg0, pg1, pg2
4. Exponential moving average
滑动平均(exponential moving average),或者叫做指数加权平均(exponentially weighted moving average),可以用来估计变量的局部均值,使得变量的更新与一段时间内的历史取值有关。
https://www.cnblogs.com/wuliytTaotao/p/9479958.html
# Exponential moving average
ema = torch_utils.ModelEMA(model) if rank in [-1, 0] else None
class ModelEMA:
""" Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models
Keep a moving average of everything in the model state_dict (parameters and buffers).
This is intended to allow functionality like
https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage
A smoothed version of the weights is necessary for some training schemes to perform well.
This class is sensitive where it is initialized in the sequence of model init,
GPU assignment and distributed training wrappers.
"""
def __init__(self, model, decay=0.9999, updates=0):
self.ema = deepcopy(model.module if is_parallel(model) else model).eval() # FP32 EMA
self.decay = lambda x: decay * (1 - math.exp(-x / 2000)) # decay exponential ramp (to help early epochs)
self.updates = updates # number of EMA updates
for p in self.ema.parameters():
p.requires_grad_(False)
def update(self, model):
# Update EMA parameters
with torch.no_grad():
self.updates += 1
belta = self.decay(self.updates)
model_state_dict = model.module.state_dict() if is_parallel(model) else model.state_dict()
for k, v in self.ema.state_dict().items():
if v.dtype.is_floating_point:
v *= belta
v += (1. - belta) * model_state_dict[k].detach()
def update_attr(self, model, include=(), exclude=('process_group', 'reducer')):
# Update EMA attributes
copy_attr(self.ema, model, include, exclude)
5. DP与DDP
DP 单机多卡
DDP 多机多卡
https://zhuanlan.zhihu.com/p/68717029
# DP mode
if device.type != 'cpu' and rank == -1 and torch.cuda.device_count() > 1:
model = torch.nn.DataParallel(model)
# DDP mode
if device.type != 'cpu' and rank != -1:
model = DDP(model, device_ids=[rank], output_device=rank)
6. Mixed precision training
为了帮助提高Pytorch的训练效率,英伟达提供了混合精度训练工具Apex。号称能够在不降低性能的情况下,将模型训练的速度提升2-4倍,训练显存消耗减少为之前的一半,接下来是混合精度的实现,这里主要用到Apex的amp工具。代码修改为: 加上这一句封装,model, optimizer = amp.initialize(model, optimizer, opt_level=“O1”)
实际流程为:调用amp.initialize按照预定的opt_level对model和optimizer进行设置。在计算loss时使用amp.scale_loss进行回传。
if mixed_precision:
model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
# Backward
if mixed_precision:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
7. 余弦退火
# 相当于余弦退火 lr_scheduler.CosineAnnealingLR 公式中最小学习率设置为0.2
lf = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.8 + 0.2 # cosine
scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
8. warm-up
这里不太懂为什么bias 的学习率要从0.1衰减到lr0
number_warmup = max(3 * number_batches, 1e3) # number of warmup iterations, max(3 epochs, 1k iterations)
# warm up
if batch_index_now <= number_warmup:
xp = [0, number_warmup]
accumulate = max(1, np.interp(batch_index_now, xp, [1, nominal_batch_size / total_batch_size]).round())
for j, param in enumerate(optimizer.param_groups):
# set bias lr from 0.1 to lr0,
param['lr'] = np.interp(batch_index_now, xp,
[0.1 if j == 2 else 0.0, param['initial_lr'] * lr_cosine(epoch)])
if 'momentum' in param:
param['momentum'] = np.interp(batch_index_now, xp, [0.9, hyp['momentum']])
9. 类别不均衡
类别权重,图像权重
https://blog.csdn.net/Graceying/article/details/120288985
def labels_to_class_weights(labels, number_class=80):
if labels[0] is None:
return torch.Tensor()
labels = np.concatenate(labels, 0)
class_label = labels[:, 0] # labels = [class xywh]
class_frequency = np.bincount(class_label, minlength=number_class) # occurences per class
class_frequency[class_frequency == 0] = 1
# 取每个类别样本个数倒数作为类别权重,数量越少的权重越大
class_weights = 1 / class_frequency
# 归一化权重
class_weights_norm = class_weights / class_weights.sum()
return torch.from_numpy(class_weights_norm)
def labels_to_image_weights(labels, number_class=80, class_weights=np.ones(80)):
# labels[i][:, 0] 表示第i幅图的所有目标类别,class_counts表示数据集中每幅图里面的每个类别targets个数统计
class_counts = [np.bincount(labels[i][:, 0].astype(np.int), minlength=number_class) for i in range(len(labels))]
# 每幅图的每个类别targets个数乘以整体类别权重再求和得到每幅图的图像权重
image_weights = np.array(class_counts * class_weights.reshape(1, number_class)).sum(1)
return image_weights
model.class_weights = labels_to_class_weights(dataset.labels, number_class).to(device) # attach class weights
if dataset.image_weights:
# Generate indices.
if rank in [-1, 0]:
# 这里我理解是对识别效果不好的类别施加更多的权重
class_weights = model.class_weights.cpu().numpy() * (1 - maps) ** 2 # class weights
image_weights = labels_to_image_weights(dataset.labels, number_class, class_weights)
# 根据图像权重重新确定图像index
dataset.indices = random.choices(range(dataset.num_files), weights=image_weights, k=dataset.num_files)
10.loss计算(见下一章)
train.py 代码
# Hyperparameters
hyp = {'optimizer': 'SGD', # ['adam', 'SGD', None] if none, default is SGD
'lr0': 0.01, # initial learning rate (SGD=1E-2, Adam=1E-3)
'momentum': 0.937, # SGD momentum/Adam beta1
'weight_decay': 5e-4, # optimizer weight decay
'giou': 0.05, # giou loss gain
'cls': 0.5, # cls loss gain
'cls_pw': 1.0, # cls BCELoss positive_weight
'obj': 1.0, # obj loss gain (*=img_size/320 if img_size != 320)
'obj_pw': 1.0, # obj BCELoss positive_weight
'iou_t': 0.20, # iou training threshold
'anchor_t': 4.0, # anchor-multiple threshold
'fl_gamma': 0.0, # focal loss gamma (efficientDet default is gamma=1.5)
'hsv_h': 0.015, # image HSV-Hue augmentation (fraction)
'hsv_s': 0.7, # image HSV-Saturation augmentation (fraction)
'hsv_v': 0.4, # image HSV-Value augmentation (fraction)
'degrees': 0.0, # image rotation (+/- deg)
'translate': 0.0, # image translation (+/- fraction)
'scale': 0.5, # image scale (+/- gain)
'shear': 0.0} # image shear (+/- deg)
def train(hyp, tb_writer, opt, device):
# --------------------------------------------------------------------------------------------------------------
# create results dir
print(f'Hyperparameters {hyp}')
log_dir = tb_writer.log_dir if tb_writer else 'runs/result'
weight_dir = os.path.join(log_dir, 'weights')
os.makedirs(weight_dir, exist_ok=True)
best_pt_dir = os.path.join(weight_dir, 'best.pt')
last_pt_dir = os.path.join(weight_dir, 'last.pt')
results_file = os.path.join(log_dir, 'results.txt')
with open(os.path.join(log_dir, 'hyp.yaml'), 'w') as f:
yaml.dump(hyp, f, sort_keys=False)
with open(os.path.join(log_dir, 'opt.yaml'), 'w') as f:
yaml.dump(vars(opt), f, sort_keys=False)
# TODO: Init DDP logging. Only the first process is allowed to log.
epochs, batch_size, total_batch_size, weights, rank = \
opt.epochs, opt.batch_size, opt.totol_batch_size, opt.weights, opt.local_rank
# Remove previous results
if rank in [-1, 0]:
for f in glob.glob(log_dir + os.sep + '*_batch*.jpg') + glob.glob(results_file):
os.remove(f)
# --------------------------------------------------------------------------------------------------------------
# init random seeds
init_seeds(2 + rank)
# --------------------------------------------------------------------------------------------------------------
# create model
with open(opt.data, 'r') as f:
data_dict = yaml.load(f, Loader=yaml.FullLoader)
number_class, names = (1, ['item']) if opt.single_cls else (int(data_dict['nc']), data_dict['names'])
assert len(names) == number_class, '%g names found for nc=%g dataset in %s' % (len(names), number_class, opt.data) # check
model = Model(opt.cfg, number_class=number_class).to(device)
# img_size is multiple of max_stride
max_stride = int(max(model.stride))
img_size, img_size_val = [check_img_size(size, max_stride=max_stride) for size in opt.img_size]
# --------------------------------------------------------------------------------------------------------------
# define nominal batch size, set weight_decay
nominal_batch_size = 64
accumulate = max(round(nominal_batch_size / total_batch_size), 1)
hyp['weight_decay'] *= total_batch_size * accumulate / nominal_batch_size
# create optimizer parameter groups
pg0, pg1, pg2 = [], [], []
for k, v in model.named_parameters():
if v.requires_grad:
if '.bias' in k: #bias
pg2.append(v)
elif '.weight' in k and 'bn' not in k: # weight need weight decay
pg1.append(v)
else:
pg0.append(v)
# --------------------------------------------------------------------------------------------------------------
# create optimizer
if hyp['optimizer'] == 'adam':
optimizer = optim.Adam(pg0, lr=hyp['lr0'], betas=(hyp['momentum'], 0.999))
else:
optimizer = optim.SGD(pg0, lr=hyp['lr0'], momentum=hyp['momentum'], nesterov=True)
optimizer.add_param_group({'params': pg1, 'weight_decay': hyp['weight_decay']}) # add pg1 with weight_decay
optimizer.add_param_group({'params': pg2}) # biases
print('Optimizer groups: %g .bias, %g conv.weight, %g other' % (len(pg2), len(pg1), len(pg0)))
del pg0, pg1, pg2
# --------------------------------------------------------------------------------------------------------------
# load model
if weights == True:
weights = '' # train from scratch
start_epoch, best_fitness = 0, 0.0
if weights.endswith('.pt'):
ckpt = torch.load(weights, map_location=device)
try:
exclude = ['anchor']
ckpt['model'] = {k: v for k, v in ckpt['model'].float().state_dict().items()
if k in model.state_dict() and not any([x in k for x in exclude])
and model.state_dict()[k].shape == v.shape}
model.load_state_dict(ckpt['model'], strict=False)
print('Transferred %g/%g items from %s' % (len(ckpt['model']), len(model.state_dict()), weights))
except KeyError as e:
s = "%s is not compatible with %s. This may be due to model differences or %s may be out of date. " \
"Please delete or update %s and try again, or use --weights '' to train from scratch." \
% (weights, opt.cfg, weights, weights)
raise KeyError(s) from e
# load optimizer
if ckpt['optimizer'] is not None:
optimizer.load_state_dict(ckpt['optimizer'])
best_fitness = ckpt['best_fitness']
# load results
if ckpt['training_results'] is not None:
with open(results_file, 'w') as f:
f.write(ckpt['training_results']) # write results.txt
# epochs
start_epoch = ckpt['epoch'] + 1
if epochs < start_epoch:
print('%s has been trained for %g epochs. Fine-tuning for %g additional epochs.' %
(weights, ckpt['epoch'], epochs))
epochs += ckpt['epoch'] # finetune additional epochs
del ckpt
# --------------------------------------------------------------------------------------------------------------
# Mixed precision training https://github.com/NVIDIA/apex
if mixed_precision:
model, optimizer = amp.initialize(model, optimizer, opt_level="O1", verbosity=0)
# --------------------------------------------------------------------------------------------------------------
# Scheduler(learning rate decay) https://arxiv.org/pdf/1812.01187.pdf
# lr_scheduler.CosineAnnealingLR
lr_cosine = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.8 + 0.2 # cosine
scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lr_cosine)
# --------------------------------------------------------------------------------------------------------------
# DataParallel
if device.type != 'cpu' and rank == -1 and torch.cuda.device_count() > 1:
model = torch.nn.DataParallel(model)
# --------------------------------------------------------------------------------------------------------------
# Exponential moving average
ema = torch_utils.ModelEMA(model) if rank in [-1, 0] else None
# --------------------------------------------------------------------------------------------------------------
# DistributedDataParallel
if device.type != 'cpu' and rank != -1:
if opt.sync_bn:
model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)
model = DDP(model, device_ids=[rank], output_device=rank)
# --------------------------------------------------------------------------------------------------------------
# prepare data
# trainloader
train_data_dir = data_dict['train']
dataloader, dataset = create_dataloader(train_data_dir, img_size, batch_size, max_stride, opt, hyp,
augment=True, cache=opt.cache_images, rect=opt.rect,
local_rank=rank, world_size=opt.world_size)
max_label_class = np.concatenate(dataset.labels, 0)[:, 0].max()
number_batches = len(dataloader)
assert max_label_class < number_class, 'Label class %g exceeds nc=%g in %s. Possible class labels are 0-%g' % \
(max_label_class, number_class, opt.data, number_class - 1)
# testloader
val_data_dir = data_dict['val']
if rank in [-1, 0]:
valloader = create_dataloader(val_data_dir, img_size_val, total_batch_size, max_stride, opt, hyp,
augment=False, cache=opt.cache_images, rect=True, local_rank=-1,
world_size=opt.world_size)[0]
# --------------------------------------------------------------------------------------------------------------
# Model parameters
hyp['cls'] *= number_class / 80. # scale coco-tuned hyp['cls'] to current dataset
model.number_class = number_class
model.hyp = hyp
model.iou_ratio = 1.0
model.class_weights = labels_to_class_weights(dataset.labels, number_class).to(device) # attach class weights
model.names = names
# --------------------------------------------------------------------------------------------------------------
# plot labels class frequency
if rank in [-1, 0]:
labels = np.concatenate(dataset.labels, 0)
c = torch.tensor(labels[:, 0]) # class
plot_labels(labels, save_dir=log_dir)
if tb_writer:
tb_writer.add_histogram('classes', c, 0)
# --------------------------------------------------------------------------------------------------------------
# start training
t0 = time.time()
number_warmup = max(3 * number_batches, 1e3) # number of warmup iterations, max(3 epochs, 1k iterations)
maps = np.zeros(number_class) # mAP per class
results = (0, 0, 0, 0, 0, 0, 0) # 'P', 'R', 'mAP', 'F1', 'val GIoU', 'val Objectness', 'val Classification'
scheduler.last_epoch = start_epoch - 1
if rank in [-1, 0]:
print('Image sizes %g train, %g test' % (img_size, img_size_val))
print('Using %g dataloader workers' % dataloader.num_workers)
print('Starting training for %g epochs...' % epochs)
for epoch in range(start_epoch, epochs):
model.train()
# Update image weights (optional)
# When in DDP mode, the generated indices will be broadcasted to synchronize dataset.
if dataset.image_weights:
# Generate indices.
if rank in [-1, 0]:
class_weights = model.class_weights.cpu().numpy() * (1 - maps) ** 2 # class weights
image_weights = labels_to_image_weights(dataset.labels, number_class, class_weights)
dataset.indices = random.choices(range(dataset.num_files), weights=image_weights, k=dataset.num_files)
# Broadcast if DDP
if rank != -1:
indices = torch.zeros([dataset.num_files], dtype=torch.int)
if rank == 0:
indices[:] = torch.tensor(dataset.indices, dtype=torch.int)
dist.broadcast(indices, 0)
if rank != 0:
dataset.indices = indices.cpu().numpy()
mloss = torch.zeros(4, device=device) # mean losses
# DDP random indices
if rank != -1:
dataloader.sampler.set_epoch(epoch)
pbar = enumerate(dataloader)
if rank in [-1, 0]:
print(('\n' + '%10s' * 8) % ('Epoch', 'gpu_mem', 'CIoU_add', 'obj', 'cls', 'total', 'targets', 'img_size'))
pbar = tqdm(pbar, total=number_batches) # progress bar
optimizer.zero_grad()
for i, (img_batch, label_batch, paths, _) in pbar:
batch_index_now = i + number_batches * epoch # number integrated batches (since train start)
img_batch = img_batch.to(device, non_blocking=True).float() / 255.0 # uint8 to float32, 0 - 255 to 0.0 - 1.0
# ----------------------------------------------------------------------------------------------------------
# warm up
if batch_index_now <= number_warmup:
xp = [0, number_warmup]
accumulate = max(1, np.interp(batch_index_now, xp, [1, nominal_batch_size / total_batch_size]).round())
for j, param in enumerate(optimizer.param_groups):
# set bias lr from 0.1 to lr0,
param['lr'] = np.interp(batch_index_now, xp,
[0.1 if j == 2 else 0.0, param['initial_lr'] * lr_cosine(epoch)])
if 'momentum' in param:
param['momentum'] = np.interp(batch_index_now, xp, [0.9, hyp['momentum']])
# ----------------------------------------------------------------------------------------------------------
# Multi-scale
if opt.multi_scale:
scale_size = random.randrange(img_size * 0.5, img_size * 1.5 + max_stride) // max_stride * max_stride
ratio = scale_size / max(img_batch.shape[2:])
if ratio != 1:
new_shape = [math.ceil(x * ratio / max_stride) * max_stride for x in img_batch.shape[2:]]
img_batch = F.interpolate(img_batch, new_shape, align_corners=False)
# ----------------------------------------------------------------------------------------------------------
# forward
pred = model(img_batch)
# ----------------------------------------------------------------------------------------------------------
# compute loss
loss, loss_items = compute_loss(pred, label_batch.to(device), model)
if rank != -1:
loss *= opt.world_size # gradient averaged between devices in DDP mode
if not torch.isfinite(loss):
print('WARNING: non-finite loss, ending training ', loss_items)
return results
# ----------------------------------------------------------------------------------------------------------
# loss back propagation
if mixed_precision:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
# ----------------------------------------------------------------------------------------------------------
# optimizer
if batch_index_now % accumulate == 0:
optimizer.step()
optimizer.zero_grad()
if ema is not None:
ema.update(model)
# ----------------------------------------------------------------------------------------------------------
# print
if rank in [-1, 0]:
mloss = (mloss * i + loss_items) / (i + 1) # update mean losses
mem = '%.3gG' % (torch.cuda.memory_cached() / 1E9 if torch.cuda.is_available() else 0) # (GB)
s = ('%10s' * 2 + '%10.4g' * 6) % (
'%g/%g' % (epoch, epochs - 1), mem, *mloss, label_batch.shape[0], img_batch.shape[-1])
pbar.set_description(s)
# ----------------------------------------------------------------------------------------------------------
# Plot
if batch_index_now < 3:
f = os.path.join(log_dir, 'train_batch%g.jpg' % batch_index_now) # filename
result = plot_images(images=img_batch, targets=label_batch, paths=paths, fname=f)
if tb_writer and result is not None:
tb_writer.add_image(f, result, dataformats='HWC', global_step=epoch)
# end batch ------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------------------
# Scheduler
scheduler.step()
# --------------------------------------------------------------------------------------------------------------
# validation
# Only the first process in DDP mode is allowed to log or save checkpoints.
if rank in [-1, 0]:
if ema is not None:
ema.update_attr(model, include=['model_yaml', 'number_class', 'hyp', 'giou_ratio', 'names', 'stride'])
final_epoch = epoch + 1 == epochs
if not opt.notest or final_epoch: # Calculate mAP
results, maps, times = test.test()
# write
with open(results_file, 'a') as f:
f.write(s + '%10.4g' * 8 % results + '\n') # P, R, mAP, F1, test_losses=(GIoU, obj, cls)
# tensorboard
if tb_writer:
tags = ['train/giou_loss', 'train/obj_loss', 'train/cls_loss',
'metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/mAP_0.75',
'metrics/mAP_0.5:0.95', 'val/giou_loss', 'val/obj_loss', 'val/cls_loss']
for x, tag in zip(list(mloss[:-1]) + list(results), tags):
tb_writer.add_scalar(tag, x, epoch)
# Update best mAP
fi = results[4]
if fi > best_fitness:
best_fitness = fi
# ----------------------------------------------------------------------------------------------------------
# save model
save = (not opt.nosave) or final_epoch
if save:
with open(results_file, 'r') as f: # create checkpoint
ckpt = {'epoch': epoch,
'best_fitness': best_fitness,
'training_results': f.read(),
'model': ema.ema.module if hasattr(ema, 'module') else ema.ema,
'optimizer': optimizer.state_dict() if not final_epoch else None}
# Save last, best and delete
torch.save(ckpt, last_pt_dir)
if (best_fitness == fi) and not final_epoch:
torch.save(ckpt, best_pt_dir)
del ckpt
# end epoch ----------------------------------------------------------------------------------------------------
# end training
if rank in [-1, 0]:
# Strip optimizers
# isnumeric() 方法检测字符串是否只由数字组成。这种方法是只针对unicode对象
n = ('_' if len(opt.name) and not opt.name.isnumeric() else '') + opt.name
fresults, flast, fbest = 'results%s.txt' % n, weight_dir + 'last%s.pt' % n, weight_dir + 'best%s.pt' % n
for f1, f2 in zip([weight_dir + 'last.pt', weight_dir + 'best.pt', 'results.txt'], [flast, fbest, fresults]):
if os.path.exists(f1):
os.rename(f1, f2) # rename
ispt = f2.endswith('.pt') # is *.pt
strip_optimizer(f2) if ispt else None # strip optimizer
# Finish
plot_results(save_dir=log_dir) # save as results.png
print('%g epochs completed in %.3f hours.\n' % (epoch - start_epoch + 1, (time.time() - t0) / 3600))
dist.destroy_process_group() if rank not in [-1, 0] else None
torch.cuda.empty_cache()
return results
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--cfg', type=str, default='models/yolov4l-mish.yaml', help='model.yaml path')
parser.add_argument('--data', type=str, default='data/dota.yaml', help='data.yaml path')
parser.add_argument('--hyp', type=str, default='', help='hyp.yaml path (optional)')
parser.add_argument('--epochs', type=int, default=300)
parser.add_argument('--batch-size', type=int, default=8, help="Total batch size for all gpus.") # 16
parser.add_argument('--img-size', nargs='+', type=int, default=[1024, 1024], help='train,test sizes')
parser.add_argument('--resume', nargs='?', const='get_last', default=False,
help='resume from given path/to/last.pt, or most recent run if blank.')
parser.add_argument('--weights', type=str, default='weights/yolov4s-mish.pt', help='initial weights path')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--rect', action='store_true', help='rectangular training')
parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
parser.add_argument('--notest', action='store_true', help='only test final epoch')
parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')
parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
parser.add_argument('--cache-images', action='store_true', help='cache images for faster training')
parser.add_argument('--name', default='', help='renames results.txt to results_name.txt if supplied')
parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')
parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
parser.add_argument('--local_rank', type=int, default=-1, help='DistributedDataParallel parameter, do not modify')
opt = parser.parse_args()
# --------------------------------------------------------------------------------------------------------------
# resume from most recent run
last = get_latest_run() if opt.resume == 'get_last' and not opt.weights else opt.resume
if last and not opt.weights:
print(f'resume from {last}')
opt.weights = last if opt.resume and not opt.weights else opt.resume
# --------------------------------------------------------------------------------------------------------------
# check_git_status check file
if opt.local_rank in [0, -1]:
check_git_status()
opt.cfg = check_file(opt.cfg)
opt.data = check_file(opt.data)
# --------------------------------------------------------------------------------------------------------------
# update hyps
if opt.hyp:
opt.hyp = check_file(opt.hyp)
with open(opt.hyp) as f:
hyp.update(yaml.load(f, Loader=yaml.FullLoader))
# --------------------------------------------------------------------------------------------------------------
# extend img_size to 2 sizes (train, test)
opt.img_size.extend([opt.img_size[-1]] * (2 - len(opt.img_size)))
# --------------------------------------------------------------------------------------------------------------
# device, total_batch_size, DDP mode
device = select_device(opt.device, apex=mixed_precision, batch_size=opt.batch_size)
opt.total_batch_size = opt.batch_size
opt.world_size = 1
if device.type == 'cpu':
mixed_precision = False
elif opt.local_rank != -1:
# DDP mode
assert torch.cuda.device_count() > opt.local_rank
torch.cuda.set_device(opt.local_rank)
device = torch.device('cuda', opt.local_rank)
# distributed backend
dist.init_process_group(backend='nccl', init_method='env://')
print(opt)
# --------------------------------------------------------------------------------------------------------------
# tensorboard
if opt.local_rank in [-1, 0]:
tb_writer = SummaryWriter(log_dir=increment_dir('runs/exp', opt.name))
else:
tb_writer = None
# --------------------------------------------------------------------------------------------------------------
# train
train(hyp, tb_writer, opt, device)
# --------------------------------------------------------------------------------------------------------------