使用Doc2Vec()方法训练得到的model中包含以下对象:
(1)wv(Word2VecKeyedVectors):
word2vec对象存储单词和向量之间的映射。用于对向量进行查找、距离、相似性计算等操作。
方法:
① closer_than(entity1, entity2)
Get all entities that are closer to entity1 than entity2 is to entity1.
② cosine_similarities(vector_1, vectors_all)
Compute cosine similarities between one vector and a set of other vectors.
③ distance(w1, w2)
Compute cosine distance between two words.
④ distances(word_or_vector, other_words=())
Compute cosine distances from given word or vector to all words in other_words.
If other_words is empty, return distance between word_or_vectors and all words
in vocab.
⑤ get_vector(word)
Get the entity’s representations in vector space, as a 1D numpy array.
⑥ most_similar_cosmul(positive=None, negative=None, topn=10)
Find the top-N most similar words, using the multiplicative combination objective.
⑦ most_similar_to_given(entity1, entities_list)
Get the entity from entities_list most similar to entity1.
⑧ n_similarity(ws1, ws2)
Compute cosine similarity between two sets of words.
⑨ relative_cosine_similarity(wa, wb, topn=10)
Compute the relative cosine similarity between two words given top-n similar words;
⑩ save(path) Save KeyedVectors. load(path)Load KeyedVectors.
⑪ wmdistance(document1, document2)
Compute the Word Mover’s Distance(词移距离) between two documents.
⑫ word_vec(word, use_norm=False)
Get word representations in vector space, as a 1D numpy array.
(2)docvecs(Doc2VecKeyedVectors):
此对象包含段落向量。记住,这个模型和word2vec之间的唯一区别是,除了词向量之外,我们还包括段落嵌入
来捕获段落。
该对象中的方法基本与WV中的方法相同;
(3)vocabulary(Doc2VecVocab):
这个对象表示模型的词汇表(字典)。除了跟踪所有独特的单词之外,这个对象还提供了额外的功能,比如按频率
对单词排序,或者丢弃非常罕见的单词。
Doc2Vec的方法:
①most_similar(**kwargs) Deprecated, use self.wv.most_similar() instead.
②most_similar_cosmul(**kwargs) Deprecated, use self.wv.most_similar_cosmul() instead.
③n_similarity(**kwargs) Deprecated, use self.wv.n_similarity() instead.