如何在python中建立wordcloud

In this tutorial I will show you how to build a word cloud of a text in Python, using the wordcloud package.

在本教程中,我将向您展示如何使用wordcloud包在Python中构建文本的词云。

In the example, I will build the wordcloud of the Saint Augustines’ Confessions, which can be downloaded from the Gutemberg Project Page. The masterpiece is split in 13 books. We have stored each book into a different file, named number.text (e.g. 1.txt and 2.txt). Each line of every file contains just one sentence.

在示例中,我将构建“圣奥古斯丁自白”的词云,该词云可从Gutemberg Project Page下载。 杰作分为13本书。 我们已经将每本书存储到一个名为number.text的不同文件中(例如1.txt和2.txt)。 每个文件的每一行仅包含一个句子。

The source code of this tutorial can be downloaded from my Github repository.

可以从我的Github存储库下载本教程的源代码。

入门 (Getting started)

安装并熟悉wordcloud软件包 (Install and get familiar with the wordcloud package)

The first step towards the creation of the word cloud involves the installation of the wordcloud package, through the command pip install wordcloud. Then you can import the class WordCloud as well as the list of STOPWORDS.

向创建词云的第一步骤涉及的安装wordcloud包,通过命令pip install wordcloud 。 然后,您可以导入WordCloud类以及STOPWORDS列表。

from wordcloud import WordCloud, STOPWORDS

If you want, you can add other stopwords to your list.

如果需要,可以将其他停用词添加到列表中。

stopwords = set(STOPWORDS) 
stopwords.add('thou')

The Wordcloud function needs a sentence as input containing all the words for which the word cloud should be calculated. In our case, we should store all the text of the masterpiece into a variable. We can read the text of each book by opening the related file and store it into a global variable, called all_text.

Wordcloud函数需要一个句子作为输入,其中包含应为其计算词云的所有词。 在我们的案例中,我们应该将杰作的所有文本存储到一个变量中。 我们可以通过打开相关文件来阅读每本书的文本,并将其存储到一个名为all_text的全局变量中。

all_text = ""
for book in range(1,14):
file = open('sources/' + str(book) + '.txt')
lines = file.readlines()
for line in lines:
all_text += " " + line

建立wordcloud (Build the wordcloud)

为您的文本创建世界云 (Create the world cloud for your text)

Now we are ready to build the wordcloud. We can create a WordCloud object, by passing it the size of the wordcloud, the list of stopwords, the background color and the minimum font size.

现在我们准备构建wordcloud。 我们可以通过传递WordCloud的大小,停用词列表,背景颜色和最小字体大小来创建WordCloud对象。

wordcloud = WordCloud(width = 800, height = 300, stopwords = stopwords,background_color ='white',  min_font_size = 10)

Once built the WordCloud object, we can call the method generate() to calculate the worcloud of a text passed as argument.

构建WordCloud对象后,我们可以调用方法generate()来计算作为参数传递的文本的worcloud。

wordcloud.generate(all_text)

绘制结果 (Plot results)

将wordcloud保存到图片中 (Save the wordcloud into a picture)

Finally, we are ready to plot results. We can exploit the imshow() function provided by the matplotlib library.

最后,我们准备绘制结果。 我们可以利用matplotlib库提供的imshow()函数。

import matplotlib.pyplot as plt
plt.figure(figsize = (8, 3), facecolor = None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad = 0)
plt.savefig('plots/word_cloud.png')
plt.show()

学过的知识 (Lesson learned)

It is very simple to build a wordcloud of a text in Python. This can be done through the wordcloud package

在Python中建立文字的文字云非常简单。 这可以通过wordcloud包来完成

  • The first step involves the definition of a string variable, which contains all the text for which the wordcloud should be calculated.

    第一步涉及字符串变量的定义,该变量包含应为其计算wordcloud的所有文本。
  • Then, the Wordcloud object can be defined by specifying some parameters as arguments as well as by passing to its method generate() the text

    然后,可以通过指定一些参数作为参数以及将其文本传递给其方法generate()来定义Wordcloud对象

  • Finally a plot of the wordcloud can be done by exploiting the matplotlib package.

    最后,可以通过利用matplotlib软件包来完成wordcloud的绘制。

翻译自: https://towardsdatascience.com/how-to-build-a-wordcloud-in-python-2f9222414fc6

你可能感兴趣的:(python)