详细参考https://blog.csdn.net/qq_35203425/article/details/80451243
主要说一下Stanford CoreNLP工具可以处理分词、词性标注、句法分析等等功能,
不需要下载那一堆分开的jar包,比如https://blog.csdn.net/zkq_1986/article/details/81583725里面提到的那一堆
至于是否需要添加到环境变量,取决于你使用哪个接口来使用Stanford CoreNLP工具,例如上面链接使用了stanfordcorenlp
接口,那么就不需要设置环境变量。
如果使用NLTK,那么需要设置环境变量。
下载:https://stanfordnlp.github.io/CoreNLP/,默认处理英文,只需要下载红色按钮,处理其他语言则需要从下面的List里面选择相应的语言包,然后放到CoreNLP解压后的文件夹内。
这个是我的文件夹目录,其中stanford-chinese-corenlp-2018-10-05-models.jar就是额外加入的用来处理中文的包,也就是上面下载图中的list里面的对应的Chinese的download
测试代码,使用stanfordcorenlp接口:
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP(r'D:\jar\stanford-corenlp-full-2018-10-05',lang='zh')
sentence = '可引起心肌缺血的原因很多,血压降低、主动脉供血减少、冠状动脉阻塞,\
可直接导致心脏供血减少;心瓣膜病、血黏度变化、心肌本身病变也会使心脏供血减少。'
print('Tokenize:', nlp.word_tokenize(sentence))
print('Part of Speech:', nlp.pos_tag(sentence))
print('Named Entities:', nlp.ner(sentence))
print('Constituency Parsing:', nlp.parse(sentence))
print('Dependency Parsing:', nlp.dependency_parse(sentence))
nlp.close()
结果:
Tokenize: ['可', '引起', '心肌', '缺血', '的', '原因', '很多', ',', '血压', '降低', '、', '主动脉', '供', '血', '减少', '、', '冠状', '动脉', '阻塞', ',', '可', '直接', '导致', '心脏', '供血', '减少', ';', '心瓣', '膜', '病', '、', '血', '黏度', '变化', '、', '心肌', '本身', '病变', '也', '会', '使', '心脏', '供血', '减少', '。']
Part of Speech: [('可', 'AD'), ('引起', 'VV'), ('心肌', 'NN'), ('缺血', 'NN'), ('的', 'DEC'), ('原因', 'NN'), ('很多', 'CD'), (',', 'PU'), ('血压', 'NN'), ('降低', 'VV'), ('、', 'PU'), ('主动脉', 'NN'), ('供', 'VV'), ('血', 'NN'), ('减少', 'VV'), ('、', 'PU'), ('冠状', 'NN'), ('动脉', 'NN'), ('阻塞', 'VV'), (',', 'PU'), ('可', 'VV'), ('直接', 'AD'), ('导致', 'VV'), ('心脏', 'NN'), ('供血', 'NN'), ('减少', 'VV'), (';', 'PU'), ('心瓣', 'NN'), ('膜', 'NN'), ('病', 'NN'), ('、', 'PU'), ('血', 'NN'), ('黏度', 'NN'), ('变化', 'NN'), ('、', 'PU'), ('心肌', 'NN'), ('本身', 'PN'), ('病变', 'NN'), ('也', 'AD'), ('会', 'VV'), ('使', 'VV'), ('心脏', 'NN'), ('供血', 'NN'), ('减少', 'VV'), ('。', 'PU')]
Named Entities: [('可', 'O'), ('引起', 'O'), ('心肌', 'O'), ('缺血', 'O'), ('的', 'O'), ('原因', 'O'), ('很多', 'NUMBER'), (',', 'O'), ('血压', 'O'), ('降低', 'O'), ('、', 'O'), ('主动脉', 'O'), ('供', 'O'), ('血', 'O'), ('减少', 'O'), ('、', 'O'), ('冠状', 'O'), ('动脉', 'O'), ('阻塞', 'O'), (',', 'O'), ('可', 'O'), ('直接', 'O'), ('导致', 'O'), ('心脏', 'O'), ('供血', 'O'), ('减少', 'O'), (';', 'O'), ('心瓣', 'O'), ('膜', 'O'), ('病', 'O'), ('、', 'O'), ('血', 'O'), ('黏度', 'O'), ('变化', 'O'), ('、', 'O'), ('心肌', 'O'), ('本身', 'O'), ('病变', 'O'), ('也', 'O'), ('会', 'O'), ('使', 'O'), ('心脏', 'O'), ('供血', 'O'), ('减少', 'O'), ('。', 'O')]
Constituency Parsing: (ROOT
(IP
(IP
(CP
(IP
(VP (AD 可)
(VP (VV 引起)
(NP (NN 心肌) (NN 缺血)))))
(DEC 的))
(IP
(NP (NN 原因))
(QP (CD 很多))
(PU ,)
(NP (NN 血压))
(VP
(VP
(VP
(VP (VV 降低))
(PU 、)
(NP (NN 主动脉)))
(VP (VV 供)
(NP (NN 血))
(IP
(IP
(VP (VV 减少)))
(PU 、)
(IP
(NP (NN 冠状) (NN 动脉))
(VP (VV 阻塞))))))
(PU ,)
(VP (VV 可)
(VP
(ADVP (AD 直接))
(VP (VV 导致)
(IP
(NP (NN 心脏) (NN 供血))
(VP (VV 减少)))))))))
(PU ;)
(IP
(NP
(NP (NN 心瓣) (NN 膜) (NN 病) (PU 、) (NN 血) (NN 黏度) (NN 变化))
(PU 、)
(NP
(NP
(NP (NN 心肌))
(NP (PN 本身)))
(NP (NN 病变))))
(VP
(ADVP (AD 也))
(VP (VV 会)
(VP (VV 使)
(NP (NN 心脏) (NN 供血))
(IP
(VP (VV 减少)))))))
(PU 。)))
Dependency Parsing: [('ROOT', 0, 10), ('aux:modal', 2, 1), ('acl', 6, 2), ('compound:nn', 4, 3), ('dobj', 2, 4), ('mark', 2, 5), ('nsubj', 10, 6), ('dep', 10, 7), ('punct', 10, 8), ('nsubj', 10, 9), ('punct', 10, 11), ('nsubj', 13, 12), ('conj', 10, 13), ('dobj', 13, 14), ('ccomp', 13, 15), ('punct', 15, 16), ('compound:nn', 18, 17), ('nsubj', 19, 18), ('dep', 15, 19), ('punct', 10, 20), ('aux:modal', 23, 21), ('advmod', 23, 22), ('conj', 10, 23), ('compound:nn', 25, 24), ('nsubj', 26, 25), ('ccomp', 23, 26), ('punct', 10, 27), ('compound:nn', 30, 28), ('compound:nn', 30, 29), ('conj', 38, 30), ('punct', 38, 31), ('compound:nn', 33, 32), ('compound:nn', 34, 33), ('conj', 38, 34), ('punct', 38, 35), ('compound:nn', 37, 36), ('compound:nn', 38, 37), ('nsubj', 41, 38), ('advmod', 41, 39), ('aux:modal', 41, 40), ('conj', 10, 41), ('compound:nn', 43, 42), ('dobj', 41, 43), ('ccomp', 41, 44), ('punct', 10, 45)]
Process finished with exit code 0