NLTK入门-常用函数

1.text.concordance(word)
这个函数就是用来搜索单词word在text 中出现多的情况,包括出现的那一行,重点强调上下文,实例如下:

>>> text1.concordance("monstrous")
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
>>>
**2.text.similar(word)**
这个函数的作用则是根据word 的上下文的单词的情况,来查找具有相似的上下文的单词.similar() 函数会在文本中 搜索具有类似结构的其他单词, 不过貌似这个函数只会考虑一些简单的指标,来作为相似度,比如上下文的词性,更多的完整匹配, 不会涉及到语义.可以看看下面的例子:

```python
>>> text1.similar("monstrous")
mean part maddens doleful gamesome subtly uncommon careful untoward
exasperate loving passing mouldy christian few true mystifying
imperial modifies contemptible
>>> text2.similar("monstrous")
very heartily so exceedingly remarkably as vast a great amazingly
extremely good sweet
>>>

这个可以看出的是, text1 和text2 对同一个单词monstrous 的不同使用风格.

3.text.common_contexts([word1,word2…])

这个函数跟simailar() 有点类似,也是在根据上下文搜索的.
不同的是,这个函数是用来搜索 共用 参数中的列表中的所有单词,的上下文.即: word1,word2 相同的上下文.看例子:

>>> text2.common_contexts(["monstrous", "very"])
a_pretty is_pretty am_glad be_glad a_lucky
>>>

4.text.dispersion_plot([word1, word2,])
这个函数是用离散图 表示 语料中word 出现的位置序列表示.

text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])

NLTK入门-常用函数_第1张图片其中横坐标表示文本的单词位置.纵坐标表示查询的单词, 坐标里面的就是,单词出现的位置.就是 单词的分布情况。

5.text.generate()
以上述不同风格产生随机文本。虽然文本是随机的,但重复使用了源文本中常见的单词和短语,从而能使我们感受到它的风格和内容。

>>> text3.generate()
In the beginning of his brother is a hairy man , whose top may reach
unto heaven ; and ye shall sow the land of Egypt there was no bread in
all that he was taken out of the month , upon the earth . So shall thy
wages be ? And they made their father ; and Isaac was old , and kissed
him : and Laban with his cattle in the midst of the hands of Esau thy
first born , and Phichol the chief butler unto his son Isaac , she

你可能感兴趣的:(nlp)