NLTK:离线安装punkt

NLTK 3.5 documentation

官方文档给出了各种安装方法,其中,提到了命令行安装指导:

Command line installation

The downloader will search for an existing nltk_data directory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The recommended system location is C:\nltk_data (Windows); /usr/local/share/nltk_data (Mac); and /usr/share/nltk_data (Unix). You can use the -d flag to specify a different location (but if you do this, be sure to set the NLTK_DATA environment variable accordingly).

Run the command python -m nltk.downloader all. To ensure central installation, run the command sudo python -m nltk.downloader -d /usr/local/share/nltk_data all.

Windows: Use the “Run…” option on the Start menu. Windows Vista users need to first turn on this option, using Start -> Properties -> Customize to check the box to activate the “Run…” option.

Test the installation: Check that the user environment and privileges are set correctly by logging in to a user account, starting the Python interpreter, and accessing the Brown Corpus (see the previous section).

 Windows 系统下可使用 python -m nltk.downloader -d C:\Users\Cui\AppData\Roaming\nltk_data 将 data 安装到指定目录。

一、问题

但是再安装 punkt 时遇到一些问题:

>>> import nltk
>>> nltk.download('punkt')
[nltk_data] Error loading punkt: 
False

这里给出 离线安装 punkt 的方法。

二、解决

1、手动下载 NLTK 数据集

这里直接附上别人的博客《解决nltk download(‘punkt‘) 连接尝试失败》;

异可在官网下载:NLTK Corpora

2、安装 punkt

把下载好的语料包 punkt.zip 解压到 nltk_data/tokenizers/ 中。

注:因为 punkt 属于 tokenizers 所以需要新建 tokenizers 文件夹。

NLTK:离线安装punkt_第1张图片

你可能感兴趣的:(小记录,nltk)