python命名实体识别工具,斯坦福大学使用NLTK命名实体识别器(NER)功能

Is this possible: to get (similar to) Stanford Named Entity Recognizer functionality using just NLTK?

Is there any example?

In particular, I am interested in extraction LOCATION part of text. For example, from text

The meeting will be held at 22 West Westin st., South Carolina, 12345

on Nov.-18

ideally I would like to get something like

(S

22/LOCATION

(LOCATION West/LOCATION Westin/LOCATION)

st./LOCATION

,/,

(South/LOCATION Carolina/LOCATION)

,/,

12345/LOCATION

.....

or simply

22 West Westin st., South Carolina, 12345

Instead, I am only able to get

(S

The/DT

meeting/NN

will/MD

be/VB

held/VBN

at/IN

22/CD

(LOCATION West/NNP Westin/NNP)

st./NNP

,/,

(GPE South/NNP Carolina/NNP)

,/,

12345/CD

on/IN

Nov.-18/-NONE-)

Note that if I enter my text into http://nlp.stanford.edu:8080/ner/process I get results far from perfect (street number and zip code are still missing) but at least "st." is a part of LOCATION and South Carolina is a LOCATION and not some "GPE / NNP" : ?

What I am doing wrong please? how can I fix it to use NLTK for extracting location piece from some text please?

Many thanks in advance!

解决方案

nltk DOES have an interface for Stanford NER, check nltk.tag.stanford.NERTagger.

from nltk.tag.stanford import NERTagger

st = NERTagger('/usr/share/stanford-ner/classifiers/all.3class.distsim.crf.ser.gz',

'/usr/share/stanford-ner/stanford-ner.jar')

st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

output:

[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'),

('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'),

('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION')]

However every time you call tag, nltk simply writes the target sentence into a file and runs Stanford NER command line tool to parse that file and finally parses the output back to python. Therefore the overhead of loading classifiers (around 1 min for me every time) is unavoidable.

If that's a problem, use Pyner.

First run Stanford NER as a server

java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer \

-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -port 9191

then go to pyner folder

import ner

tagger = ner.SocketNER(host='localhost', port=9191)

tagger.get_entities("University of California is located in California, United States")

# {'LOCATION': ['California', 'United States'],

# 'ORGANIZATION': ['University of California']}

tagger.json_entities("Alice went to the Museum of Natural History.")

#'{"ORGANIZATION": ["Museum of Natural History"], "PERSON": ["Alice"]}'

Hope this helps.

你可能感兴趣的:(python命名实体识别工具)