(1)到GitHub查找源,https://github.com/nltk/nltk_data
(2)如图所示,将packets下载下来
(3)打开jupyter,输入如下两行代码
import nltk
nltk.data.find(".")
这时,会显示nltk data存放目录
FileSystemPathPointer('C:\\ProgramData\\Anaconda3\\nltk_data')
将packets拷贝到当前目录下,然后将文件名从packets 变成 nltk_data
然后运行例子看一下,当然,这里在运行时可能还是有错,显示找不到文件,比如:运行
sents = nltk.corpus.treebank_raw.sents()
但是请打开目录看一下,这里没有treebank_raw的文件夹,而是 treebank文件夹,下面是raw文件夹,所以这是程序版本问题
C:\ProgramData\Anaconda3\nltk_data\corpora\treebank\raw
具体怎么测试nltk的文件是否安装好可以参考其他人的说法,本文只给出最简单的测试方案:
from nltk.book import *
output:
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
sents = nltk.corpus.treebank.sents()
sents
output:
[['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.'], ['Mr.', 'Vinken', 'is', 'chairman', 'of', 'Elsevier', 'N.V.', ',', 'the', 'Dutch', 'publishing', 'group', '.'], ...]
如果是在windows下离线安装,可以使用环境变量配置路径