nltk - problems solved

1. Installation
http://www.nltk.org/download
The numpy and yaml need be installed first, all the three modules are installed from source.
That's quite easy to install these modules, just download the package, unzip and then run command: sudo python setup.py install.
Usually, everything should be ready now.

However, when installing numpy module, it looks numpy need to compile some file with gcc, and the dynamic library of python is needed.
There are two different python version in my system, the first in /usr/bin/ is 2.4. The other one in usage is 2.5 and located in /usr/self/bin/.
When compiling, numpy recognize the version is 2.5, but gcc can't find libpython2.5.so, which is in /usr/self/libpython2.5.so.

My solution is create a soft link to /usr/self/libpython2.5.so under a directory gcc will search.
And everything is ok now.


2. Download data package
The first time I downloaded the data package, I tried following the instruction by input nltk.download(), and then "d book".

When downloading the first data package, I got the error msg:
"Unzipping corpora/brown.zip"
"Error with downloaded zip file"
Of course I'm confused, so I tried again. ...still the same error

Then I find the downloaded zip file and try to unzip the file myself, but unzip complains that the zip file is corrupted.

I have to check the source code of download.py, according to the file, I try to download the data another way, 'nltk.download('book').
er.. this times it works at first, then failed again in another package with the same error msg.
After that, I tried both the way a few times, but all failed.

So I have to go on reading the source code, after reading the whole flow for downloading, there nothing special. Thus I tried the second way again, and some more packages are downloaded ..
Then failed with a new error msg: "Error downloading 'state_union' from <http://nltk.googlecode.com/svn/trunk/nltk_data/packages/corpora/state_union.zip>: <urlopen error(-3, 'Temporary failure in name resulution')>

This error message is much more clear, the problems were always related to downloading.
Maybe the network was not stable then, or the server was not stable, or ...

After that, I gives it some time more trying. And all data been downloaded now.:)

3. Import nltk module
Error msg: TypeError: walk() got an unexpected keyword argument 'followlinks'.
when import nltk book module, it make use of os.walk() function with named argument 'followlinks', which is add in python 2.6. However, my python version is 2.5.
solution: install a new version of python, 2.7.2 for me.
And yaml, numpy and nltk should be reinstalled for new python.

4. Tkinter
When I try command 'text4.dispersion_plot(["python"]), it failed and complains "nltk.draw package not loaded (please install Tkinter library)".
The result is that my system is redhat, and tcl/tk was installed without devel package. So install the corresponding devel package and remake and install your python. done!

5. Matplotlib
When running command 'dispersion_plot(), it will complain: "ValueError: The plot function requeres the matplotlib package (aka pylab)", if you have install matplotlib.
When install matplotlib, I met this error:
"File "/build/buildd/matplotlib-1.0.1/setupext.py", line 832, in check_for_tk (Tkinter.__version__.split()[-2], Tkinter.TkVersion, Tkinter.TclVersion)) IndexError: list index out of range"

This is an known error, you could find the fix patch here:
https://trac.macports.org/ticket/29893

你可能感兴趣的:(nltk - problems solved)