Stanford中文分词

参考:
https://stackoverflow.com/questions/45663121/about-stanford-word-segmenter/45668849
https://cloud.tencent.com/developer/article/1346917

主要解决方法: https://github.com/nltk/nltk/pull/1735

    命令行:
    wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
    unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31
    wget http://nlp.stanford.edu/software/stanford-chinese-corenlp-2016-10-31-models.jar
    wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties

	启动服务:
    java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
    -serverProperties StanfordCoreNLP-chinese.properties \
    -preload tokenize,ssplit,pos,lemma,ner,parse \
    -status_port 9001  -port 9001 -timeout 15000
会有错误: 参考:https://stackoverflow.com/questions/27955569/stanford-nlp-could-not-find-main-class-error
 java -Xmx4g -cp stanford-corenlp-full-2016-10-31/stanford-corenlp-3.7.0.jar:stanford-chinese-corenlp-2016-10-31-models.jar edu.stanford.nlp.pipeline.StanfordCoreNLPServer     -serverProperties StanfordCoreNLP-chinese.properties     -preload tokenize,ssplit,pos,lemma,ner,parse     -status_port 9001  -port 9001 -timeout 15000 
 
*好好的工具做的这么难用,那是醉了!!!!!!!!!!!!!!!!!!!!!!!*

### https://www.e-learn.cn/content/wangluowenzhang/563460
	运行python (NOTE:只能用 nltk==3.2.5版本):
    from nltk.tag.stanford import CoreNLPPOSTagger, CoreNLPNERTagger
    from nltk.tokenize.stanford import CoreNLPTokenizer
    stpos, stner = CoreNLPPOSTagger('http://localhost:9001'), CoreNLPNERTagger('http://localhost:9001')
    sttok = CoreNLPTokenizer('http://localhost:9001')
    sttok.tokenize(u'我家没有电脑。')
    ['我家', '没有', '电脑', '。']

另外一种解决方法:

你可能感兴趣的:(nlp)