PYTHON 词图/WordCloud,

需要两个库一个是jieba切词库,将一段句子切词用法比较简单。就是

import jieba
print " ".join(jieba.cut('我是来自中国北京某某大学的一名硕士研究生,这是我的测试语句,下面测试北京大学生和北京大学学生。'))

词云代码。py实现。mylist里面的string可以是文章也可以是词语,如果是文章则需要用jieba分词切一下。
因为这个需求比较简单,有兴趣的可以改一下。

from os import path
import jieba
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
mylist = [u'投诉分布',u'业务处理规范',u'刷卡赠好礼',u'天天民生日',u'惠吃惠生活',u'懂你的信用卡',u'精细化经营',u'诚于民,道相生',u'差异化分析',u'预测',u'客户行为',u'精准营销',u'以客户为中心',u'客户工单分析',u'投诉分布',u'业务处理规范',u'刷卡增好礼',u'以市场为导向',u'以创新为导向']
word_list = [" ".join(jieba.cut(sentence)) for sentence in mylist]
word_list.extend(mylist)
new_text = ' '.join(word_list)
wc = WordCloud(font_path="C/windows/fonts/MSYHBD.TTC", background_color="black",max_words=2000,height=300,width=600,prefer_horizontal=0.75,min_font_size=5,max_font_size=50,margin=0)
wc.generate(new_text)
plt.imshow(wc)
plt.axis("off")
plt.show()

其中wordcloud类的参数说明如下:

Word cloud object for generating and drawing.

    Parameters
    ----------
    font_path : string 字体路径
        Font path to the font that will be used (OTF or TTF).
        Defaults to DroidSansMono path on a Linux machine. If you are on
        another OS or don't have this font, you need to adjust this path.

    width : int (default=400)宽度
        Width of the canvas.

    height : int (default=200)高度
        Height of the canvas.

    prefer_horizontal : float (default=0.90)水平的词条的百分比,
        The ratio of times to try horizontal fitting as opposed to vertical.
        If prefer_horizontal < 1, the algorithm will try rotating the word
        if it doesn't fit. (There is currently no built-in way to get only vertical
        words.)

    mask : nd-array or None (default=None) 词图的形状,默认是方的。输入图以后可以变成图的形状
        If not None, gives a binary mask on where to draw words. If mask is not
        None, width and height will be ignored and the shape of mask will be
        used instead. All white (#FF or #FFFFFF) entries will be considerd
        "masked out" while other entries will be free to draw on. [This
        changed in the most recent version!]

    scale : float (default=1)
        Scaling between computation and drawing. For large word-cloud images,
        using scale instead of larger canvas size is significantly faster, but
        might lead to a coarser fit for the words.

    min_font_size : int (default=4)最小字号
        Smallest font size to use. Will stop when there is no more room in this
        size.

    font_step : int (default=1)不同字体之间的差距
        Step size for the font. font_step > 1 might speed up computation but
        give a worse fit.

    max_words : number (default=200)最大词数
        The maximum number of words.

    stopwords : set of strings or None限制词
        The words that will be eliminated. If None, the build-in STOPWORDS
        list will be used.

    background_color : color value (default="black")背景颜色
        Background color for the word cloud image.

    max_font_size : int or None (default=None)最大字号
        Maximum font size for the largest word. If None, height of the image is
        used.

    mode : string (default="RGB")背景模式
        Transparent background will be generated when mode is "RGBA" and
        background_color is None.

    relative_scaling : float (default=.5)
        Importance of relative word frequencies for font-size.  With
        relative_scaling=0, only word-ranks are considered.  With
        relative_scaling=1, a word that is twice as frequent will have twice
        the size.  If you want to consider the word frequencies and not only
        their rank, relative_scaling around .5 often looks good.

        .. versionchanged: 2.0
            Default is now 0.5.

    color_func : callable, default=None
        Callable with parameters word, font_size, position, orientation,
        font_path, random_state that returns a PIL color for each word.
        Overwrites "colormap".
        See colormap for specifying a matplotlib colormap instead.

    regexp : string or None (optional)
        Regular expression to split the input text into tokens in process_text.
        If None is specified, ``r"\w[\w']+"`` is used.

    collocations : bool, default=True
        Whether to include collocations (bigrams) of two words.

        .. versionadded: 2.0

    colormap : string or matplotlib colormap, default="viridis"
        Matplotlib colormap to randomly draw colors from for each word.
        Ignored if "color_func" is specified.

        .. versionadded: 2.0

    normalize_plurals : bool, default=True
        Whether to remove trailing 's' from words. If True and a word
        appears with and without a trailing 's', the one with trailing 's'
        is removed and its counts are added to the version without
        trailing 's' -- unless the word ends with 'ss'.

    Attributes
    ----------
    ``words_`` : dict of string to float
        Word tokens with associated frequency.

        .. versionchanged: 2.0
            ``words_`` is now a dictionary

    ``layout_`` : list of tuples (string, int, (int, int), int, color))
        Encodes the fitted word cloud. Encodes for each word the string, font
        size, position, orientation and color.

    Notes
    -----
    Larger canvases with make the code significantly slower. If you need a
    large word cloud, try a lower canvas size, and set the scale parameter.

    The algorithm might give more weight to the ranking of the words
    than their actual frequencies, depending on the ``max_font_size`` and the
    scaling heuristic.
    """

你可能感兴趣的:(python)