pytorch-lightning浅踩坑记录

1.版本变动很多,参考docs时认准最新版

2.'ddp'并没有比默认并行strategy快,由于速度已经满意,没有深究ddp optimization

3. tensorboard logger添加图片用self.log报错,可以调用tensorboard原始方法:

self.logger.experiment.add_image("target image", target_img_plot, self.global_step, dataformats='NCHW')

4.global_step是optimizer update的次数,不是单纯的iteration次数,所以如果有n个optimizer,值会翻n倍。

5. 想要每N个iteration保存一次模型:

checkpoint_callback = ModelCheckpoint(
    every_n_train_steps=50000,
    every_n_epochs=0,
    auto_insert_metric_name=False,
    dirpath=os.path.join(opt.checkpoints_dir, opt.name),
    save_top_k=-1, # save all models
    filename="step_{step}", 
)

trainer = Trainer(
    ...
    callbacks=[checkpoint_callback],
    ...
)

你可能感兴趣的:(深度学习,开发技巧,python,pytorch,人工智能,python)