u013250861

NLP-预训练模型-2019-NLU+NLG：BART【Bert+GPT的泛化Seq2Seq模型】【噪声破坏后的原文本喂给编码器，解码器输出原文本】【噪音方案：文本填充（文本片段用单个掩码替换）】

《原始论文：BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension》

一切都得从Transformer说起。Transformer左半边为Encoder，右半边为Decoder。我们将Encoder输入的句子称为source，Decoder输入的句子称为target

Encoder负责将source进行self-attention并获得句子中每个词的representation，最经典的Encoder架构就是BERT，通过Masked Language Model来学习词之间的关系，另外还有XLNet, RoBERTa, ALBERT, DistilBERT等等。

单独Encoder结构不适用于生成任务。

Decoder如下图所示，输入与输出之间差一个位置，主要是模拟在Inference时，不能让模型看到未来的词，这种方式称为AutoRegressive，常见的基于Decoder的模型通常是用来做序列生成的，例如GPT, CTRL等等。但是单独Decoder结构仅基于左侧上下文预测单词，无法学习双向交互。

而两者合在一起后，就能当成一种Seq2Seq模型，进行翻译任务。

下图是BART的主要结构，看上去似乎和Transformer没什么不同，主要区别在于source和target

训练阶段，Encoder端使用双向模型编码被破坏的文本，然后Decoder采用自回归的方式计算出原始输入；
测试阶段或者是微调阶段，Encoder和Decoder的输入都是未被破坏的文本

一、摘要

BART是 Bidirectional and Auto-Regressive Transformers的简写。

BART的训练主要由2个步骤组成：

使用任意噪声函数破坏文本；
模型学习重建原始文本。

BART 使用基于 Transformer 的标准神经机器翻译架构，可视为BERT(双向编码器)、GPT(从左至右的解码器)等近期出现的预训练模型的泛化形式。

文中评估了多种噪声方法，最终发现通过以下2中方法能够获取最优性能。：

随机打乱原始句子的顺序，
再使用首创的新型文本填充方法(即用单个 mask token 替换文本片段，换句话说不管是被mask掉多少个token，都只用一个特定的mask token表示该位置有token被遮蔽了)

BART 尤其擅长处理文本生成任务，不过在自然语言理解任务中也颇有可圈可点之处。

在同等训练资源下，BART 在 GLUE 和 SQuAD 数据集上的效果与 RoBERTa 不相伯仲，并在对话、问答和文本摘要等任务中斩获得新的记录，在 XSum 数据集上的性能比之前的最佳结果高出了6个ROUGE。

在机器翻译任务中，BART 在仅使用目标语言预训练的情况下，获得了比回译系统高出 1.1 个 BLEU 值的结果。

此外，文章还使用控制变量法在BART 框架内使用其他预训练机制，从而更好地评估影响下游任务性能的因素。

PS：特指BART刚出道之际。比如CNN / Daily Mail当下冠军模型是Big Bird加持下的BigBird-Pegasus

BART结合双向（BERT）和自回归（GPT） Transformer对模型进行预训练。

BART的一个关键优势是噪声的随意性，可以动用任何方式(包括改变长度)对原始文本进行破坏。

这种方式让模型学习过程中更多地考虑句子的整体长度，并对输入进行更大范围的转换，从而将BERT中MLM和NSP目标统一起来。

此外，BART也为微调开创了一个新思路。BART做机器翻译的时候，将BART堆叠在一些额外的Transformer层之上，这些附加的Transformer层实质上是把其他语种翻译成带噪的英语，再通过BART模型，从而将BART作为一个预训练好的目标端语言模型。这种方法在WMT Romanian-English数据集上高出回译系统1.1个BLEU。

一、Bart、Bert、GPU、Transformer对比

BART还参考了GPT中的激活函数，将ReLU也改为GeLU。BART、BERT和GPT之间的对比如 Figure1所示。

(a) BERT：用掩码替换随机 token，双向编码文档。由于缺失 token 被单独预测，因此 BERT 较难用于生成任务。
(b) GPT：使用自回归方式预测 token，这意味着GPT可用于生成任务。但是，该模型仅基于左侧上下文预测单词，无法学习双向交互。
© BART：编码器输入与解码器输出无需对齐，即允许任意噪声变换。使用掩码符号替换文本段，从而破坏文本。使用双向模型编码被破坏的文本（左），然后使用自回归解码器计算原始文档的似然（右）。至于微调，未被破坏的文档是编码器和解码器的输入，研究者使用来自解码器最终隐藏状态的表征。

BART与Transformer的不同：

decoder中的每一层都与encoder最后隐藏层执行交叉关注（cross-attention，就像在transformer序列到序列模型中一样）。
BERT在预测token之前接一个前馈网络，而BART没有。
同GPT一样，将ReLU激活函数改为GeLU，并且参数初始化服从正态分布 $N (0, 0.02)$ ；
BART base模型的Encoder和Decoder各有6层，large模型增加到了12层；

二、BART预训练

BART的损失函数是decoder的输出与原始文本之间的交叉熵。

与其他去噪自编码器(一般需要定制特定的噪声方案)不同的是BART可以使用任何的加噪方式。在极端情况下，源信息可以全部缺失，此时的BART就蜕化成了一个语言模型。

1、加噪方案（即原始文本如何被破坏）

1.1 Token Masking

与 BERT 一样，BART 随机采样 token，并用 [MASK] 这一预定义的特殊token进行替换。

1.2 Token Deletion

从输入中随机删除 token。与 Token Masking不同，模型必须同时确定输入中缺失的位置。

1.3 Text Infilling

采样文本中的多个片段，每个片段长度服从λ = 3的泊松分布。

每个文本片段用单个[MASK] token替换。

从泊松分布中采样出长度为 0 的文本片段对应插入 [MASK] token。

这种文本填充方法的思想源于SpanBERT，但SpanBERT采样的文本片段长度服从的是几何分布，且用等长的[MASK] token 序列替换掉文本片段。

因此，BART能够迫使模型学习到一个片段中所缺失的token数量。

1.4 Sentence Permutation

这里的句子排列变换是指按句号将文档分割成多个句子，然后随机打乱这些句子。

1.5 Document Rotation

随机均匀地选择一个token，再旋转文档使文档以该 token 作为起始。该任务的目的是训练模型识别文档开头。

2、预训练目标对比

文章中还充分对比了不同预训练目标的影响，实验过程对比了两种方案：

将所有任务视为sequence-to-sequence问题，source端输入到encoder，decoder端的输出即为target结果。
在decoder端将source作为target的一个前缀，且只在序列的target部分有损失函数。

实验发现前者对BART模型更有效，后者对其他模型更有效。

上图是预训练目标对比。各个预训练目标源于BERT, MASS, GPT, XLNet和UniLM。对比的模型都是尺寸近似，训练步数都是1M，预训练使用的数据也相同。

可以看出使用文本填充方案（Text Infilling）的BART战绩斐然。从中可以得出以下结论：

在不同的任务中，预训练方法的表现有显著差异。换句话说，预训练方法的有效性高度依赖于任务本身。比如，一个简单的语言模型在ELI5数据集上可以夺冠，但是在SQUAD上的结果却是最差的。
遮蔽Token至关重要。只使用旋转文档或句子组合的预训练目标则效果较差，效果较好的都是使用了token的删除或遮蔽作为预训练目标。此外，在生成任务上，删除token似乎比遮蔽token更胜一筹。
从左到右的预训练目标有助于文本生成任务。遮蔽语言模型和排列语言模型在文本生成任务上不如其他模型。而这两种模型在预训练阶段都没有用到从左到右的自回归语言模型。
对于SQuAD而言双向的encoder至关重要。因为上下文在分类决策中至关重要，BART仅用双向层数的一半就能达到BERT类似的性能。
预训练目标并不是唯一重要的因素。这里的排列语言模型略逊于XLNet，其中一些差异可能是由于没有使用XLNet架构中的其他的改进，如相对位置编码和片段级的循环机制。
纯语言模型在ELI5数据集上技压群雄，其困惑度远优于其他模型。这表明当输出仅受到输入的松散约束时，BART较为低效。

总而言之，使用文本填充预训练目标的BAR在多项任务上(除了ELI5之外)效果都很好。

2.1 语言模型

与GPT类似，训练一个从左到右的Transformer语言模型。该模型相当于BART的decoder，只是没有交叉注意(cross-attention)。

2.2 排列语言模型

该模型基于XLNet，采样1/6的token，并以自回归的随机顺序生成。为了与其他模型保持一致，这里没有引入相对位置编码和XLNet中的片段级的循环注意力机制。

2.3 带遮蔽的语言模型

与BERT相同，15%的token用 [MASK] token替换，训练模型重建出这些被遮蔽掉的token。

2.4 多任务遮蔽的语言模型

与 UniLM 一样，使用额外self-attention mask训练带遮蔽的语言模型。自注意力遮蔽按如下比例随机选择:1/6从左到右；1/6从右到左；1/3未遮蔽；剩余的1/3中前50%的未遮蔽，其余的从左到右遮蔽。

2.5 带遮蔽的seq-to-seq

与MASS模型类似，遮蔽一个片段中50%的token，并训练一个序列到序列模型预测被遮蔽的tokens。

三、下游任务（Fine-tuning）

BART在文本分类和翻译任务中的微调如Figure 3所示。以下具体介绍 BART 在各个下游任务的微调。

a：当使用 BART 解决分类问题，用相同的输入文本输入到encoder和decoder，使用最终输出的表征。
b：对于机器翻译任务，训练一个额外的小型encoder来替换 BART 中的词嵌入。新encoder可使用不同的词汇。

1、序列分类任务

同一个输入同时输入到encoder 和decoder，将最后decoder的token的最终隐层状态被输入到一个新的多类别线性分类器中。该方法与 BERT 中的 CLS token 类似，不过 BART 在decoder最后额外添加了一个 token，如此该 token 在decoder中的表征可以关注到完整输入的decoder状态（见Figure 3a）。

2、token 分类任务

token的分类任务，比如SQuAD中答案端点位置的分类。将完整文档输入到encoder和decoder中，使用decoder最上方的隐状态作为每个token的表征以判断该 token 的类别，比如是否为答案端部。

3、序列生成任务

由于 BART 具备自回归解码器，因此可以直接应用到序列生成任务(如生成式问答和文本摘要)进行微调。在这两项任务中，从输入复制经过处理的信息，这与去噪预训练目标紧密相关。encoder的输入是输入序列，decoder以自回归的方式生成输出。

4、机器翻译

BART用以机器翻译的时候，将整个BART(包括encoder和decoder)作为一个单独的预训练decoder，并增加一系列的从双语语料学习而得的encoder，如 Figure 3b所示。具体操作上是用一个新的随机初始化encoder替换 BART encoder的嵌入层。该模型以端到端的方式训练，即训练一个新的encoder将其他语种词映射到输入(BART可将其去噪为英文）。这个新的encoder可以使用不同于原始 BART 模型的词汇表。

源encoder的训练分两步，均需要将BART模型输出的交叉熵损失进行反向传播。

(1)冻结 BART 的大部分参数，仅更新随机初始化的源encoder、BART 位置嵌入和 BART encoder第一层的自注意力输入投影矩阵。
(2)将所有模型参数进行少量迭代训练。

四、实验结果

1、自然语言理解任务（Discriminative Tasks）

由于更大模型和更大batch size有助于下游任务性能的提升，所以文章还进一步对比各模型的large版。

Large版的BART，encoder和decoder分别有12层，隐层大小为1024，batch size与RoBERTa一样都是8000，模型预训练了500000个step。tokenized方法借用 GPT-2 中的字节对编码(BPE)。各个模型在GLUE上的实验对比结果如 Table 2所示。

Table 2：Large版模型在 SQuAD 和 GLUE 上的实验结果。BART 的效果可比肩 RoBERTa 和 XLNet，这表明 BART 的单向decoder层并不会降低模型在判别任务上的性能。

总体而言，BART在自然语言理解任务上与其他先进模型不相上下。这表明BART在生成任务上的进一步突破并不是以牺牲自然语言理解性能为代价。

2、自然语言生成任务

2.1 文本摘要任务

在文本生成任务中选用了摘要生成(CNN/DailyMail 和XSum)、对话(CONVAI2)和生成式问答(ELI5，是一个长篇问答数据集)中对应的数据集进行评测，结果如 Table 3所示。

从结果可以看出，在这两个摘要任务上，BART 在所有度量指标上均优于之前的模型。BART在更抽象的 XSum 数据集上的比之前最优的RoBERTa模型高出3.5个点(所有的ROUGE指标)。

此外，从人工评测的角度来看，BART也大幅优于之前的模型。但与人类的摘要结果相比仍然有差距。

2.2 对话任务

各模型在CONVAI2上的实验结果如Table 4所示。

BART 在对话生成任务上的性能优于之前的模型。其中困惑度基于 ConvAI2 官方 tokenizer 进行了重新归一化。

2.3 生成式QA任务

BART在具有挑战性的ELI5长文档问答数据集上达到了最先进的结果。

发现BART的性能比之前最好的工作(指Seq2Seq Multi-task)高出1.2个 ROUGE-L。其实该数据集难度较大，因为数据集中的问题只对答案进行了微弱的指定。

3、机器翻译任务

BART在WMT16 Romanian-English上与其他模型的对比结果如Table 8所示。BART 和基线模型(Transformer)在机器翻译任务上的性能对比情况。

参与对比的模型使用数据集包括 WMT16 RO-EN 和用回译系统做的扩增数据。可以看出BART使用单语英文预训练，性能结果优于基线模型。

五、BART预训练模型用于文本摘要

1、方式01：`from transformers import AutoTokenizer, AutoModelForSeq2SeqLM`

# https://github.com/huggingface/transformers/blob/master/src/transformers/models/t5/modeling_t5.py
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained(r'D:\Pretrained_Model\bart-base')
model = AutoModelForSeq2SeqLM.from_pretrained(r'D:\Pretrained_Model\bart-base')

# 用T5做文本摘要任务，前面加 "summarize:"识别符
text = """
        summarize: (CNN)For the second time during his papacy, Pope Francis has announced a new group of bishops and archbishops set to become cardinals -- and they come from all over the world.
        Pope Francis said Sunday that he would hold a meeting of cardinals on February 14 "during which I will name 15 new Cardinals who, coming from 13 countries from every continent, manifest the indissoluble links between the Church of Rome and the particular Churches present in the world," according to Vatican Radio.
        New cardinals are always important because they set the tone in the church and also elect the next pope, CNN Senior Vatican Analyst John L. Allen said. They are sometimes referred to as the princes of the Catholic Church.
        The new cardinals come from countries such as Ethiopia, New Zealand and Myanmar.
        "This is a pope who very much wants to reach out to people on the margins, and you clearly see that in this set," Allen said. "You're talking about cardinals from typically overlooked places, like Cape Verde, the Pacific island of Tonga, Panama, Thailand, Uruguay."
        But for the second time since Francis' election, no Americans made the list.
        "Francis' pattern is very clear: He wants to go to the geographical peripheries rather than places that are already top-heavy with cardinals," Allen said.
        Christopher Bellitto, a professor of church history at Kean University in New Jersey, noted that Francis announced his new slate of cardinals on the Catholic Feast of the Epiphany, which commemorates the visit of the Magi to Jesus' birthplace in Bethlehem.
        "On feast of three wise men from far away, the Pope's choices for cardinal say that every local church deserves a place at the big table."
        In other words, Francis wants a more decentralized church and wants to hear reform ideas from small communities that sit far from Catholicism's power centers, Bellitto said.
        That doesn't mean Francis is the first pontiff to appoint cardinals from the developing world, though. Beginning in the 1920s, an increasing number of Latin American churchmen were named cardinals, and in the 1960s, St. John XXIII, whom Francis canonized last year, appointed the first cardinals from Japan, the Philippines and Africa.
        In addition to the 15 new cardinals Francis named on Sunday, five retired archbishops and bishops will also be honored as cardinals.
        Last year, Pope Francis appointed 19 new cardinals, including bishops from Haiti and Burkina Faso.
        CNN's Daniel Burke and Christabelle Fombu contributed to this report.
"""
# CNN/DM答案：
# @highlight
# The 15 new cardinals will be installed on February 14
# @highlight
# They come from countries such as Myanmar and Tonga
# @highlight
# No Americans made the list this time or the previous time in Francis' papacy

inputs = tokenizer(text, max_length=1024, truncation=True, return_tensors='pt')

print('inputs = ', inputs)

summary_ids = model.generate(inputs['input_ids'])

print('\nsummary_ids = ', summary_ids)

print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False))

打印结果：

Ignored unknown kwarg option direction
inputs =  {'input_ids': tensor([[    0, 50118,  1437,  1437,  1437,  1437,  1437,  1437,  1437, 40402,
            35,    36, 16256,    43,  2709,     5,   200,    86,   148,    39,
         13102,  5073,     6,  8509,  5075,    34,   585,    10,    92,   333,
             9, 19929,     8,  9599,   428, 32084,   278,     7,   555,  1886,
         17552,   480,     8,    51,   283,    31,    70,    81,     5,   232,
             4, 50118,  1437,  1437,  1437,  1437,  1437,  1437,  1437,  8509,
          5075,    26,   395,    14,    37,    74,   946,    10,   529,     9,
          1886, 17552,    15,   902,   501,    22, 37460,    61,    38,    40,
           766,   379,    92,  6293,    54,     6,   567,    31,   508,   749,
            31,   358,  9183,     6, 19318,     5,  9473,  3006,  1168, 26633,
          5678,   227,     5,  2197,     9,  8947,     8,     5,  1989, 37391,
          1455,    11,     5,   232,    60,   309,     7, 11484,  4611,     4,
         50118,  1437,  1437,  1437,  1437,  1437,  1437,  1437,   188,  1886,
         17552,    32,   460,   505,   142,    51,   278,     5,  6328,    11,
             5,  2352,     8,    67, 10371,     5,   220, 16627,     6,  3480,
          3596, 11484,  9821,   610,   226,     4,  3823,    26,     4,   252,
            32,  2128,  4997,     7,    25,     5, 39978,     9,     5,  4019,
          2197,     4, 50118,  1437,  1437,  1437,  1437,  1437,  1437,  1437,
            20,    92,  1886, 17552,   283,    31,   749,   215,    25, 13934,
             6,   188,  3324,     8,  7095,     4, 50118,  1437,  1437,  1437,
          1437,  1437,  1437,  1437,    22,   713,    16,    10, 16627,    54,
           182,   203,  1072,     7,  1338,    66,     7,    82,    15,     5,
          5510,     6,     8,    47,  2563,   192,    14,    11,    42,   278,
            60,  3823,    26,     4,    22,  1185,   214,  1686,    59,  1886,
         17552,    31,  3700, 16042,  2127,     6,   101,  6268,  3060,  2794,
             6,     5,  3073,  2946,     9, 17922,   102,     6, 12276,     6,
          6547,     6, 17609,    72, 50118,  1437,  1437,  1437,  1437,  1437,
          1437,  1437,   125,    13,     5,   200,    86,   187,  5075,   108,
           729,     6,   117,  1791,   156,     5,   889,     4, 50118,  1437,
          1437,  1437,  1437,  1437,  1437,  1437,    22, 38461,   354,   108,
          6184,    16,   182,   699,    35,    91,  1072,     7,   213,     7,
             5, 20456, 41464,   918,  1195,    87,  2127,    14,    32,   416,
           299,    12, 18888,    19,  1886, 17552,    60,  3823,    26,     4,
         50118,  1437,  1437,  1437,  1437,  1437,  1437,  1437,  5469,  3043,
         29765,     6,    10,  3097,     9,  2352,   750,    23,  3350,   260,
           589,    11,   188,  3123,     6,  1581,    14,  5075,   585,    39,
            92, 15777,     9,  1886, 17552,    15,     5,  4019, 37072,     9,
             5, 14230, 39880,     6,    61, 16293,  1626,     5,   825,     9,
             5,  3771,   118,     7,  5772,   108, 32357,    11, 26557,     4,
         50118,  1437,  1437,  1437,  1437,  1437,  1437,  1437,    22,  4148,
         23220,     9,   130, 11036,   604,    31,   444,   409,     6,     5,
          8509,    18,  5717,    13, 32533,   224,    14,   358,   400,  2352,
          8613,    10,   317,    23,     5,   380,  2103,    72, 50118,  1437,
          1437,  1437,  1437,  1437,  1437,  1437,    96,    97,  1617,     6,
          5075,  1072,    10,    55, 34930,  2352,     8,  1072,     7,  1798,
          3114,  2956,    31,   650,  1822,    14,  2662,   444,    31, 42580,
            18,   476,  5228,     6,  3043, 29765,    26,     4, 50118,  1437,
          1437,  1437,  1437,  1437,  1437,  1437,   280,   630,    75,  1266,
          5075,    16,     5,    78, 29476,  4822,     7,  9653,  1886, 17552,
            31,     5,  2623,   232,     6,   600,     4, 22856,    11,     5,
         18283,    29,     6,    41,  2284,   346,     9,  5862,   470,  2352,
          2262,    58,  1440,  1886, 17552,     6,     8,    11,     5,  7571,
            29,     6,   312,     4,   610, 26166, 24457,     6,  2661,  5075,
         32839,  1538,    94,    76,     6,  3873,     5,    78,  1886, 17552,
            31,  1429,     6,     5,  5639,     8,  1327,     4, 50118,  1437,
          1437,  1437,  1437,  1437,  1437,  1437,    96,  1285,     7,     5,
           379,    92,  1886, 17552,  5075,  1440,    15,   395,     6,   292,
          3562,  9599,   428, 32084,     8, 19929,    40,    67,    28,  7809,
            25,  1886, 17552,     4, 50118,  1437,  1437,  1437,  1437,  1437,
          1437,  1437,  1426,    76,     6,  8509,  5075,  3873,   753,    92,
          1886, 17552,     6,   217, 19929,    31, 17009,     8, 18294,  1243,
         21433,   139,     4, 50118,  1437,  1437,  1437,  1437,  1437,  1437,
          1437,  3480,    18,  3028, 12032,     8,  4845, 14286,   459,   274,
          5223,   257,  3162,     7,    42,   266,     4, 50118,     2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

summary_ids =  tensor([[    2,     0,  1640, 16256,    43,  1437,  1437,  1437,  2537,  1437,
          1437,    36, 16256,    43,  2709,     5,   200,    86,   148,     2]])

['(CNN)   Â   (CNN)For the second time during']
['(CNN)   Â   (CNN)For the second time during']

Process finished with exit code 0

2、方式02：`from transformers import BartTokenizer, BartForConditionalGeneration`

# https://github.com/huggingface/transformers/blob/master/src/transformers/models/t5/modeling_t5.py
from transformers import BartTokenizer, BartForConditionalGeneration

tokenizer = BartTokenizer.from_pretrained(r'D:\Pretrained_Model\bart-base')
model = BartForConditionalGeneration.from_pretrained(r'D:\Pretrained_Model\bart-base')

text = """
         (CNN)For the second time during his papacy, Pope Francis has announced a new group of bishops and archbishops set to become cardinals -- and they come from all over the world.
        Pope Francis said Sunday that he would hold a meeting of cardinals on February 14 "during which I will name 15 new Cardinals who, coming from 13 countries from every continent, manifest the indissoluble links between the Church of Rome and the particular Churches present in the world," according to Vatican Radio.
        New cardinals are always important because they set the tone in the church and also elect the next pope, CNN Senior Vatican Analyst John L. Allen said. They are sometimes referred to as the princes of the Catholic Church.
        The new cardinals come from countries such as Ethiopia, New Zealand and Myanmar.
        "This is a pope who very much wants to reach out to people on the margins, and you clearly see that in this set," Allen said. "You're talking about cardinals from typically overlooked places, like Cape Verde, the Pacific island of Tonga, Panama, Thailand, Uruguay."
        But for the second time since Francis' election, no Americans made the list.
        "Francis' pattern is very clear: He wants to go to the geographical peripheries rather than places that are already top-heavy with cardinals," Allen said.
        Christopher Bellitto, a professor of church history at Kean University in New Jersey, noted that Francis announced his new slate of cardinals on the Catholic Feast of the Epiphany, which commemorates the visit of the Magi to Jesus' birthplace in Bethlehem.
        "On feast of three wise men from far away, the Pope's choices for cardinal say that every local church deserves a place at the big table."
        In other words, Francis wants a more decentralized church and wants to hear reform ideas from small communities that sit far from Catholicism's power centers, Bellitto said.
        That doesn't mean Francis is the first pontiff to appoint cardinals from the developing world, though. Beginning in the 1920s, an increasing number of Latin American churchmen were named cardinals, and in the 1960s, St. John XXIII, whom Francis canonized last year, appointed the first cardinals from Japan, the Philippines and Africa.
        In addition to the 15 new cardinals Francis named on Sunday, five retired archbishops and bishops will also be honored as cardinals.
        Last year, Pope Francis appointed 19 new cardinals, including bishops from Haiti and Burkina Faso.
        CNN's Daniel Burke and Christabelle Fombu contributed to this report.
"""
# CNN/DM答案：
# @highlight
# The 15 new cardinals will be installed on February 14
# @highlight
# They come from countries such as Myanmar and Tonga
# @highlight
# No Americans made the list this time or the previous time in Francis' papacy

inputs = tokenizer(text, max_length=1024, truncation=True, return_tensors='pt')

print('inputs = ', inputs)

summary_ids = model.generate(inputs['input_ids'])

print('\nsummary_ids = ', summary_ids)

print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])
print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False))