WordNet是面向语义的英语词典,类似于传统字典。
WordNet的层次化结构为:
一个synset(同义词集:指意义相同的词条的集合)被一个三元组描述:(单词.词性.序号)。这里的’dog.n.01’指:dog的第一个名词意思;’chase.v.01’指:chase的第一个动词意思。
函数方法 | 描述说明 |
---|---|
wn.synsets(word) | 查询一个词所在的所有词集 |
wn.synset(‘apple.n.01’).definition() | 查询一个同义词集的定义 |
wn.synset(‘dog.n.01’).examples() | 查询一个同义词集的例子 |
wn.synsets(‘beau’, pos = wn.NOUN) | 查询某词语的相同词性的同义词集合 |
wn.synset(‘dog.n.01’).hypernyms() | 查询某同义词集的上位词集合 |
wn.synset(‘dog.n.01’).hyponyms() | 查询某同义词集的下位词集合 |
wn.synset(‘dog.n.01’).member_holonyms() | 查询单词含义的部分/整体含义词 |
wn.lemma(‘dog.n.01.dog’).synset() | 查询词条所属的同义词集 |
函数方法 | 描述说明 |
---|---|
wn.synset(‘dog.n.01’).lemmas() | 查询一个同义词集的所有词条(lemma) |
函数方法 | 描述说明 |
---|---|
wn.synset(‘dog.n.01’).lemma_names() | 查询一个同义词集中的所有词 |
wn.lemma(‘dog.n.01.dog’).name() | 查询某词条中对应的单词 |
from nltk.corpus import wordnet as wn
synsets = wn.synsets('published')
print(synsets)
[Synset('print.v.01'), Synset('publish.v.02'), Synset('publish.v.03'), Synset('published.a.01'), Synset('promulgated.s.01')]
definition = wn.synset('apple.n.01').definition()
print(definition)
fruit with red or yellow or green skin and sweet to tart crisp whitish flesh
examples = wn.synset('dog.n.01').examples()
print(examples)
[u'the dog barked all night']
sets = wn.synsets('beau', pos = wn.NOUN)
print(sets)
[Synset('boyfriend.n.01'), Synset('dandy.n.01')]
# 查询所有的上位词集合
dog = wn.synset('dog.n.01')
hypernym_sets = dog.hypernyms()
print(hypernym_sets)
# 查询一个最一般的上位(或根上位)同义词集
root_hypernym = dog.root_hypernyms()
print(root_hypernym)
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]
[Synset('entity.n.01')]
dog = wn.synset('dog.n.01')
hyponym_sets = dog.hyponyms()
print(hyponym_sets)
[Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'), Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'), Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'), Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), Synset('pug.n.01'), Synset('puppy.n.01'), Synset('spitz.n.01'), Synset('toy_dog.n.01'), Synset('working_dog.n.01')]
dog = wn.synset('dog.n.01')
member_holonyms = dog.member_holonyms()
print(member_holonyms)
[Synset('canis.n.01'), Synset('pack.n.06')]
synset = wn.lemma('dog.n.01.dog').synset()
print(synset)
Synset('dog.n.01')
lemmas = wn.synset('dog.n.01').lemmas()
print(lemmas)
print(type(lemmas[1]))
print(lemmas[1])
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
Lemma('dog.n.01.domestic_dog')
lemma_names = wn.synset('dog.n.01').lemma_names()
print(lemma_names)
"""综合举例"""
synset_words = [lemma.name() for lemma in wn.synset('dog.n.01').lemmas()]
print(synset_words)
[u'dog', u'domestic_dog', u'Canis_familiaris']
[u'dog', u'domestic_dog', u'Canis_familiaris']
word = wn.lemma('dog.n.01.dog').name()
print(word)
dog
entailment_sets = wn.synset('walk.v.01').entailments() # 走路蕴含着抬脚
print(entailment_sets)
[Synset('step.v.01')]
反义词关系只能通过词条来获得。
good = wn.synset('good.a.01')
antonym = good.lemmas()[0].antonyms()
print(antonym)
print(antonym[0].name())
[Lemma('bad.a.01.bad')]
bad
synset1.path_similarity(synset2): 是基于上位词层次结构中相互连接的概念之间的最短路径在0-1范围的打分(两者之间没有路径就返回-1)。同义词集与自身比较将返回1.
"""名词和动词语义相似度计算"""
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
similarity = dog.path_similarity(cat)
print(similarity)
0.2
值得注意的是,名词和动词被组织成了完整的层次式分类体系,形容词和副词没有被组织成分类体系,所以不能用path_distance。
形容词和副词最有用的关系是similar to。
"""形容词和副词语义相似度计算"""
synsets = wn.synsets('glorious')
print(synsets)
words = synsets[0].lemma_names()
print(words)
similarity_sets = synsets[0].similar_tos()
print(similarity_sets)
[Synset('glorious.a.01'), Synset('brilliant.s.03'), Synset('glorious.s.03')]
[u'glorious']
[Synset('bright.s.06'), Synset('celebrated.s.02'), Synset('divine.s.06'), Synset('empyreal.s.02'), Synset('illustrious.s.02'), Synset('incandescent.s.02'), Synset('lustrous.s.02')]
Part-of-speech constants:
ADJ, ADJ_SAT, ADV, NOUN, VERB = ‘a’, ‘s’, ‘r’, ‘n’, ‘v’
NLTK库WordNet的使用方法实例
NLTK之WordNet 接口
WordNet介绍和使用