import jieba
print " ".join(jieba.cut('我是来自中国北京某某大学的一名硕士研究生,这是我的测试语句,下面测试北京大学生和北京大学学生。'))
from os import path
import jieba
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
mylist = [u'投诉分布',u'业务处理规范',u'刷卡赠好礼',u'天天民生日',u'惠吃惠生活',u'懂你的信用卡',u'精细化经营',u'诚于民,道相生',u'差异化分析',u'预测',u'客户行为',u'精准营销',u'以客户为中心',u'客户工单分析',u'投诉分布',u'业务处理规范',u'刷卡增好礼',u'以市场为导向',u'以创新为导向']
word_list = [" ".join(jieba.cut(sentence)) for sentence in mylist]
new_text = ' '.join(word_list)
wc = WordCloud(font_path="C/windows/fonts/MSYHBD.TTC", background_color="black",max_words=2000,height=300,width=600,prefer_horizontal=0.75,min_font_size=5,max_font_size=50,margin=0)
Word cloud object for generating and drawing.
font_path : string 字体路径
Font path to the font that will be used (OTF or TTF).
Defaults to DroidSansMono path on a Linux machine. If you are on
another OS or don't have this font, you need to adjust this path.
width : int (default=400)宽度
Width of the canvas.
height : int (default=200)高度
Height of the canvas.
prefer_horizontal : float (default=0.90)水平的词条的百分比,
The ratio of times to try horizontal fitting as opposed to vertical.
If prefer_horizontal < 1, the algorithm will try rotating the word
if it doesn't fit. (There is currently no built-in way to get only vertical
mask : nd-array or None (default=None) 词图的形状,默认是方的。输入图以后可以变成图的形状
If not None, gives a binary mask on where to draw words. If mask is not
None, width and height will be ignored and the shape of mask will be
used instead. All white (#FF or #FFFFFF) entries will be considerd
"masked out" while other entries will be free to draw on. [This
changed in the most recent version!]
scale : float (default=1)
Scaling between computation and drawing. For large word-cloud images,
using scale instead of larger canvas size is significantly faster, but
might lead to a coarser fit for the words.
min_font_size : int (default=4)最小字号
Smallest font size to use. Will stop when there is no more room in this
font_step : int (default=1)不同字体之间的差距
Step size for the font. font_step > 1 might speed up computation but
give a worse fit.
max_words : number (default=200)最大词数
The maximum number of words.
stopwords : set of strings or None限制词
The words that will be eliminated. If None, the build-in STOPWORDS
list will be used.
background_color : color value (default="black")背景颜色
Background color for the word cloud image.
max_font_size : int or None (default=None)最大字号
Maximum font size for the largest word. If None, height of the image is
mode : string (default="RGB")背景模式
Transparent background will be generated when mode is "RGBA" and
background_color is None.
relative_scaling : float (default=.5)
Importance of relative word frequencies for font-size. With
relative_scaling=0, only word-ranks are considered. With
relative_scaling=1, a word that is twice as frequent will have twice
the size. If you want to consider the word frequencies and not only
their rank, relative_scaling around .5 often looks good.
.. versionchanged: 2.0
Default is now 0.5.
color_func : callable, default=None
Callable with parameters word, font_size, position, orientation,
font_path, random_state that returns a PIL color for each word.
Overwrites "colormap".
See colormap for specifying a matplotlib colormap instead.
regexp : string or None (optional)
Regular expression to split the input text into tokens in process_text.
If None is specified, ``r"\w[\w']+"`` is used.
collocations : bool, default=True
Whether to include collocations (bigrams) of two words.
.. versionadded: 2.0
colormap : string or matplotlib colormap, default="viridis"
Matplotlib colormap to randomly draw colors from for each word.
Ignored if "color_func" is specified.
.. versionadded: 2.0
normalize_plurals : bool, default=True
Whether to remove trailing 's' from words. If True and a word
appears with and without a trailing 's', the one with trailing 's'
is removed and its counts are added to the version without
trailing 's' -- unless the word ends with 'ss'.
``words_`` : dict of string to float
Word tokens with associated frequency.
.. versionchanged: 2.0
``words_`` is now a dictionary
``layout_`` : list of tuples (string, int, (int, int), int, color))
Encodes the fitted word cloud. Encodes for each word the string, font
size, position, orientation and color.
Larger canvases with make the code significantly slower. If you need a
large word cloud, try a lower canvas size, and set the scale parameter.
The algorithm might give more weight to the ranking of the words
than their actual frequencies, depending on the ``max_font_size`` and the
scaling heuristic.