W2V

What is Word2Vector

In machine learning models such as neural networks, we can't directly process string-type data, so we need to convert them into pure digital information. In this conversion process, we hope that the data can retain the original information as much as possible.

Word2Vector, like one-hot, is a model for converting text data into vectors, which is used extensively in natural language processing (NLP). One-hot counts all the words in the text, and then for each vocabulary number, a V-dimensional vector is created for each word. Each dimension of the vector represents a word, so the corresponding number the dimension value in the position is 1, and the other dimensions are all 0. Although this method retains the original word information, the dimension is too high in the case of a large number of texts, and it cannot reflect the relationship between two words. For example, cat and kitten are significantly closer than cat and Coral, but they are not represented by word vectors. Word2Vector, by learning the text, uses the word vector to represent the semantic information of the word, that is, through an "embedded space", the distance between the semantically similar words is close. This can reduce the dimension and reflect the relationship between words and words.

Embedding is such a mapping that maps the space in which the original word is located to the new space.

 

How do we use W2V

In the Word2Vec model, there are mainly two models of Skip-Gram and CBOW. From the intuitive understanding, Skip-Gram is a given input word to predict the context. CBOW is a given context to predict the input word. In this data set, we want to predict other products that are needed according to the products selected by the consumers. Therefore, this experiment uses the Skip-Gram model and improves it on this basis.

 

The Word2Vector model is actually divided into two parts, the first part is to build the model, and the second part is to get the embedded word vector through the model. The whole modeling process of Word2Vector is actually similar to the idea of auto-encoder, that is, constructing a neural network based on training data. When this model is trained, we will not use this trained model to process it. The new task, what we really need is the parameters that the model learns by training the data, such as the weight matrix of the hidden layer - we will see later that these weights are actually the word vectors we are trying to learn in Word2Vec.

 

In the experiment, we refer to the bagged-prod2vec method of Grbovic, M. for the attribute of the product name. This means that the model is trained at the level of the order, not at the level of the product. The word vector representation of the item is obtained by maximizing the modified objective function:

Probability P(em+j |pmk) of observing products from neighboring receipt em+j , em+j = (pm+j,1 . . . pm+j,Tm), given the k-th product from m-th receipt reduces to a product of probabilities P(em+j |pmk) = P(pm+j,1|pmk) × . . . × P(pm+j,Tm|pmk), each defined using soft-max (3.2)[1].

 

By maximizing the modified objective function, we get the Embedding layer, which is the word vector of the word. Finally, the commodity prediction can be completed by finding a word vector that is closer to the target vocabulary.

 

Why choose W2V

Because W2V can not only help us to convert words into the form of word vector, but also to reflect the relationship between words and words through training, to achieve the purpose of finding words close to the input words.

 

Reference:

1.Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., & Sharp, D. (2015, August). E-commerce in your inbox: Product recommendations at scale. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1809-1818). ACM.

 

 

你可能感兴趣的:(W2V)