BART 文本摘要示例

采用BART进行文本摘要

首先从transformers library导入BartTokenizer进行分词, 以及BartForConditionalGeneration进行文本摘要

from transformers import BartTokenizer, BartForConditionalGeneration 

这里,bart-large-cnn 是预训练的BART模型,模型大小约为1.6G(下载时间可能较慢,需耐心等待进度条).

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

BART 文本摘要示例_第1张图片

text = """We have created a large diverse set of cars from overhead
images, which are useful for training a deep
learner to binary classify, detect and count them. The dataset and all
related material will be made publically available. The set contains contextual
matter to aid in identification of difficult targets.We demonstrate
classification and detection on this dataset using a neural network we call
ResCeption. This network combines residual learning with Inceptionstyle
layers and is used to count cars in one look. This is a new way
to count objects rather than by localization or density estimation. It is
fairly accurate, fast and easy to implement. Additionally, the counting
method is not car or scene specific. It would be easy to train this method
to count other kinds of objects and counting over new scenes requires no
extra set up or assumptions about object locations"""
text = '''
2008年凭借歌曲《青花瓷》获得第19届金曲奖最佳作曲人奖。2009年入选美国CNN评出的“25位亚洲最具影响力人物” ,
同年凭借专辑《魔杰座》获得第20届金曲奖最佳国语男歌手奖。2010年入选美国《Fast Company》评出的“全球百大创意人物” 。
2011年凭借专辑《跨时代》再度获得金曲奖最佳国语男歌手奖,并且第四次获得金曲奖最佳国语专辑奖;同年主演好莱坞电影《青蜂侠》。
2012年登福布斯中国名人榜榜首。2014年发行华语乐坛首张数字音乐专辑《哎呦,不错哦》。2016年发行专辑《周杰伦的床边故事》
'''

对长文本进行分词

inputs = tokenizer([text], max_length=1024, return_tensors='pt')

利用预训练BART模型产生这段长文本摘要的数字索引

summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=100, early_stopping=True)

将获取的数字索引在字典中查找,并输出对应的单词

summary = ([tokenizer.decode(i, skip_special_tokens=True, clean_up_tokenization_spaces=False) for i in summary_ids])

将获取的摘要打印出来

print(summary)

在这里插入图片描述

你可能感兴趣的:(自然语言处理,python,r语言,自然语言处理,python)