目前见过的对transfer learning最好的解释

今天博主在查GLUE的资料时,发现一个对transfer learning非常好的阐述,所以在这里和大家分享,原文链接:https://mccormickml.com/2019/11/05/GLUE/

Unlike single task models that are designed for and trained end-to-end on a specific task, transfer learning models extensively train their large network on a generalized language understanding task. The idea is that with some extensive generalized pretraining the model gains a good “understanding” of language in general. Then, this generalized “understanding” gives us an advantage when it comes time to adjust our model to tackle a specific task. By:

  • Removing the input and output layers we used for generalized pretraining.
  • Replacing them with the task-specific input/output layers we’re interested in.
  • Continuing to train this new network for a few epochs we leverage the “middle” part of the model that “understands” language to help give us a leg up on our specific task.

目前见过的对transfer learning最好的解释_第1张图片
Figure 2: Initial Pretraining Architecture (Untrained), Trained Language Network, Fine-Tuning Architecture

红色部分就是transfer learning的精髓了,图片也画的超赞。简单来说,就是在一个更加通用的任务上预训练模型,然后保留模型参数,将模型的输入层和输出层换成下游任务的输入输出,再进行简单训练就可以啦。

至于transfer learning的好处嘛,在大多数的任务上都能带来效果上的提升(比如BERT当年横空出世血洗NLP十一大任务),但是也不是所有任务上都有提升,比如目标检测任务(可以看下这篇文章 [1] https://zhuanlan.zhihu.com/p/50808893)。不过有一点可以确定的是,使用预训练和从零开始训练,二者要达到相同的performance,预训练确实所需时间更短,下图是摘自[1]中的训练对比:
目前见过的对transfer learning最好的解释_第2张图片
灰色曲线是预训练,红色是从零开始训练,很明显从零训练的话,足够长时间也能达到和预训练一样的效果。所以如果你计算资源有限的话,使用预训练绝对错不了,当然要是有充足的计算资源的话,对于某些任务而言,平地起高楼也是可以的啦。

你可能感兴趣的:(深度学习)