[NLP-文章中心句提取]可以阅读理解当中

我们在读书的过程中经常会做英文的阅读理解,那么如何快速的提取文章的中心句呢,gensim这个包提供了很好的接口:

#文章中心句提取
from gensim import corpora
from gensim import summarization
text1 = 'With recent speeches in both Silicon Valley and China, Jeff Dean, one of Google’s leading engineers, spotlighted a Google project called AutoML. ML is short for machine learning, referring to computer algorithms that can learn to perform particular tasks on their own by analyzing data. AutoML, in turn, is a machine-learning algorithm that learns to build other machine-learning algorithms. With it, Google may soon find a way to create A.I. technology that can partly take the humans out of building the A.I. systems that many believe are the future of the technology industry. The project is part of a much larger effort to bring the latest and greatest A.I. techniques to a wider collection of companies and software developers.'

summarization.summarize(text1)
# 以下为结果
'With recent speeches in both Silicon Valley and China, Jeff Dean, one of Google’s leading engineers, spotlighted a Google project called AutoML.'

说到这个就必须联系到textRank算法,当然它是通过那个pageRank那个著名的算法变化而来的,
具体的公式和文章大家可以参阅网上的许多资料,我从比较浅显的文字来说明就是,
当一个词和其他词在词义上如果有相关性的话,那么我就类似的认为他们就像网页上是互相有链接的,这样子如果一个词频繁的和其他词有联系,那么它的权重就会相应的增加,那么我们来计算每句话中包含关键词的权重加起来的得分作为这个句子的得分,如果分数最高,那么就把它当作这个段落的中心句

//TODO
1.详细的了解textRank和pageRank算法

你可能感兴趣的:(nlp,gensim)