1、去下载官方包 (http://www.nltk.org/nltk_data/)
2、选择其中一个路径,将解压后的文件按照该路径放好
再运行一下代码
# nltk库提供了直接生成N-gram的方法
# 以布朗语料库的单词(token)为例,尝试使用它
from nltk.corpus import brown
from nltk.util import ngrams
token = []
for each in brown.categories():
token.append(brown.words(categories = each))
unigram = ngrams(token, 1)
bigram = ngrams(token, 2)
for i in unigram:
print(i)
出现一下结果即下载成功
(['Dan', 'Morgan', 'told', 'himself', 'he', 'would', ...],)
(['Northern', 'liberals', 'are', 'the', 'chief', ...],)
(['Assembly', 'session', 'brought', 'much', 'good', ...],)
(['Thirty-three', 'Scotty', 'did', 'not', 'go', 'back', ...],)
(['The', 'Office', 'of', 'Business', 'Economics', '(', ...],)
(['Too', 'often', 'a', 'beginning', 'bodybuilder', ...],)
(['It', 'was', 'among', 'these', 'that', 'Hinkle', ...],)
(['1', '.', 'Introduction', 'It', 'has', 'recently', ...],)
(['In', 'American', 'romance', ',', 'almost', 'nothing', ...],)
(['There', 'were', 'thirty-eight', 'patients', 'on', ...],)
(['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...],)
(['As', 'a', 'result', ',', 'although', 'we', 'still', ...],)
(['It', 'is', 'not', 'news', 'that', 'Nathan', ...],)
(['They', 'neither', 'liked', 'nor', 'disliked', 'the', ...],)
(['Now', 'that', 'he', 'knew', 'himself', 'to', 'be', ...],)