Python 自然语言处理 第一章

from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

注意:若报错,将nltk存在报错信息中的指定目录里

concordance()用来查找单词,只能查单个词

similar()用来查找与参数上下文相同的词都有哪些,参数为单个词。上下文相同的词在NLP中被称为相似词

common_contexts()用来查找多个词所具有的共同上下文,参数为list形式

generate()可以用文本中的词汇随机产生一段文本

text1.concordance("Dick")
Displaying 25 of 84 matches:
                                     Dick by Herman Melville 1851 ] ETYMOLOGY 
must be the same that some call Moby Dick ." " Moby Dick ?" shouted Ahab . " D
e that some call Moby Dick ." " Moby Dick ?" shouted Ahab . " Do ye know the w
 Death and devils ! men , it is Moby Dick ye have seen -- Moby Dick -- Moby Di
it is Moby Dick ye have seen -- Moby Dick -- Moby Dick !" " Captain Ahab ," sa
ck ye have seen -- Moby Dick -- Moby Dick !" " Captain Ahab ," said Starbuck ,
 Captain Ahab , I have heard of Moby Dick -- but it was not Moby Dick that too
 of Moby Dick -- but it was not Moby Dick that took off thy leg ?" " Who told 
 my hearties all round ; it was Moby Dick that dismasted me ; Moby Dick that b
s Moby Dick that dismasted me ; Moby Dick that brought me to this dead stump I
white whale ; a sharp lance for Moby Dick !" " God bless ye ," he seemed to ha
 white whale ? art not game for Moby Dick ?" " I am game for his crooked jaw ,
l whaleboat ' s bow -- Death to Moby Dick ! God hunt us all , if we do not hun
hunt us all , if we do not hunt Moby Dick to his death !" The long , barbed st
owels to feel fear ! CHAPTER 41 Moby Dick . I , Ishmael , was one of that crew
ividualizing tidings concerning Moby Dick . It was hardly to be doubted , that
on must have been no other than Moby Dick . Yet as of late the Sperm Whale fis
ident ignorantly gave battle to Moby Dick ; such hunters , perhaps , for the m
g and piling their terrors upon Moby Dick ; those things had gone far to shake
ies , which eventually invested Moby Dick with new terrors unborrowed from any
rmen recalled , in reference to Moby Dick , the earlier days of the Sperm Whal
ngs were ready to give chase to Moby Dick ; and a still greater number who , c
 was the unearthly conceit that Moby Dick was ubiquitous ; that he had actuall
their superstitions ; declaring Moby Dick not only ubiquitous , but immortal (
 shaped lower jaw beneath him , Moby Dick had reaped away Ahab ' s leg , as a 

len(text)获取text文本中长度(其中包括标点符号)(相当于语料)

set(text)获取text中不重复元素的个数(相当于字典)

sort(set(text))对set(text)中的内容进行排序(标点符号在前面,按字母顺序排序)

text.count(“word”)计算text中word出现的次数

链表:1.多个链表的链接 +

2.在链表中增加元素 append()

3.索引、切片

sent1
['Call', 'me', 'Ishmael', '.']

字符串与链表之间的转换:jion()和split

sentence=' '.join(['OH','MY','GOD'])
print sentence
OH MY GOD
split_sentence=sentence.split()
print split_sentence
['OH', 'MY', 'GOD']
saying=['After','all','is','said','and','done','more','is','said','than','done']
tokens=sorted(set(saying))[-2:]
print tokens
['said', 'than']
fdist1=FreqDist(text1)
print fdist1

for key in fdist1:
    print key,fdist1[key]
funereal 1
unscientific 1
divinely 2
foul 11
four 74
gag 2
prefix 1
……

你可能感兴趣的:(Python 自然语言处理 第一章)