author: Kuncheng Xie
note: Due to the limitation of ability and time of author, there may be some bugs and incompletion in the article.
Network embedding maps vertices into low-dimensional vector space, which is dense, continuous and meaningful for modelling the relationships between vertices.
The things in nature is always discrete. How to represent things such as words. One naïve way is to represent then with index number, as 1,2,3 … \dots … . But such a representation is not good in neural networks. A better representation is using one hot encoding, only one 1 in a vector, such as representing the word ‘apple’ with the vector [ 0 , 0 , ⋯ , 0 , 1 , 0 , ⋯ , 0 ] [0,0,\cdots,0,1,0,\cdots,0] [0,0,⋯,0,1,0,⋯,0].
However this kind of representation has two big problems:
At last word embedding gets rid of the problems. It relies on the distributional hypothesis[1], i.e. the assumption that words with similar contexts(other words) have the same meaning. It makes use of unsupervised learning to exploit the big data of corpus. The learned representation vectors can then be used in downstream tasks as sentences sentiment analysis, reading comprehension and so on.
The problems of representing words similarly exist in representing vertices in networks like users in social networks. Network embeddings follow the ideas of word embeddings and turn network information into dense, low-dimensional real-valued vectors. For the learned features of vertices, the meaningful vectors can be used as input for existing machine learning algorithms[2], like vertex classification, community discovery, recommendation system, etc.
There are many kinds of algorithm to learn the representation of networks and I will introduce some presentative modern models which I know.
DeepWalk[3] first introduces the technique of word embedding, neural networks learning, into the network embedding. Through truncated random walks, we can sample lots of vertex sequences and feed them to the neural network model such as SkipGram[4]. It outperforms the conventional methods in some tasks with a great gap, proving the superiority of exploiting the relation between vertices in a network and encoding it with dense, low-dimensional vectors.
Afterwards, different kinds of network representation learning algorithms mushroomed focusing on different aspects of networks.
LINE[5] doesn’t use neural networks to learn the embeddings but try to minimize the distance between the empirical distribution and carefully designed distribution, usually with KL-divergence. Besides, it make use of the first-order and the second-order proximities to learn a good embedding. It can fit the large scale network and outperforms DeepWalk in some tasks.
Soon, node2vec[6] was proposed with more flexible notion of a node’s network neighborhood to learn richer representations. The algorithm designs a biased random walk procedure with some parameters to control the random walk between BFS and DFS.
Informational networks, where each vertex also contains rich external information such as text and labels, draw people’s attention recently. TADW[7] uses matrix factorization to combine text information. CANE[8] focuses on the different aspects of a vertex when interacting with its different neighbor vertices, and use mutual attention mechanism to obtain the context-aware text information. It combines the structure embedding and text embedding to form embeddings for each edge in the network. The proposed objective function and the basic ideas are followed by X Zhang(2018)[9]. They use a subtler structure learning strategy, the diffusion process, to gain more representative information.
Network embedding has progressed a lot since DeepWalk. However, there are still some aspects need to be improved proposed by Tu, et al[2].
[1] Felipe Almeida, Geraldo Xexeo. Word Embeddings: A survey. arXiv preprint arXiv:1901.09069,2019.
[2] C Tu, C Yang, Z Liu, et al. Network representation learning: an overview. Scientia Sinica informationis, 2017.
[3] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014: 701-710.
[4] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems. 2013: 3111-3119.
[5] Tang J, Qu M, Wang M, et al. Line: Large-scale information network embedding[C]//Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, 2015: 1067-1077.
[6] Grover A, Leskovec J. node2vec: Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016: 855-864.
[7] Yang C, Liu Z, Zhao D, et al. Network representation learning with rich text information[C]//Twenty-Fourth International Joint Conference on Artificial Intelligence. 2015.
[8] Tu C, Liu H, Liu Z, et al. Cane: Context-aware network embedding for relation modeling[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017: 1722-1731.
[9] X Zhang, Y Li, et al. Diffusion Maps for Textual Network Embedding. In 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.