NLTK下载后找不到资源

今天在看《Python网络数据采集》一书,看到NLKT相关的部分自己动手去做。然而,运行时候找不到相关的资源。

from nltk import word_tokenizefrom nltk 
import Text
tokens = word_tokenize("Here is some not very interesting text")
text = Text(tokens)

运行时出现了:

D:\Python27\python.exe H:/temp/python-scraping/chapter8/5-NltkTokenize.py
Traceback (most recent call last):
  File "H:/temp/python-scraping/chapter8/5-NltkTokenize.py", line 7, in 
    tokens = word_tokenize("Here is some not very interesting text")
  File "D:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 106, in word_tokenize
    return [token for sent in sent_tokenize(text, language)
  File "D:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 90, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "D:\Python27\lib\site-packages\nltk\data.py", line 801, in load
    opened_resource = _open(resource_url)
  File "D:\Python27\lib\site-packages\nltk\data.py", line 919, in _open
    return find(path_, path + ['']).open()
  File "D:\Python27\lib\site-packages\nltk\data.py", line 641, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource u'tokenizers/punkt/english.pickle' not found.  Please
  use the NLTK Downloader to obtain the resource:  >>>
  nltk.download()
  Searched in:
    - 'C:\\Users\\Administrator/nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - 'D:\\Python27\\nltk_data'
    - 'D:\\Python27\\lib\\nltk_data'
    - 'C:\\Users\\Administrator\\AppData\\Roaming\\nltk_data'
    - u''
**********************************************************************

去网上查了半天,后来定睛一看。我靠,原来它查找的目录不是在我下载的目录。我修改了下载的路径。
所以,需要修改它的查找路径:

from nltk import data
data.path.append(u"G:\\nltk_data")

这样程序就可以运行了,或者还可以采用设置NLTK_DATA 环境变量的方法修改NLTK查找的路径。
建议:在最好放在其他的NLTK导入之前。

你可能感兴趣的:(NLTK下载后找不到资源)