Lecture 2 – Word Vectors and Word Senses
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第1张图片](http://img.e-com-net.com/image/info8/f6c3c4a2d1aa4bdfa0c67be78dc6db41.jpg)
1. Review: Main idea of word2vec
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第2张图片](http://img.e-com-net.com/image/info8/a53f0a14cf7f43eaaaa2a4f0d2d4dbca.jpg)
Word2vec parameters and computations
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第3张图片](http://img.e-com-net.com/image/info8/1a0cc4ec1eb149b7a3b008df82e36485.jpg)
Word2vec maximizes objective function by putting similar words nearby in space
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第4张图片](http://img.e-com-net.com/image/info8/ca91fcad6ccd40d6889bb0fdcf4d954f.jpg)
2. Optimization: Gradient Descent
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第5张图片](http://img.e-com-net.com/image/info8/56d8236875d6403d8e4900ca0c9ce3ef.jpg)
Gradient Descent
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第6张图片](http://img.e-com-net.com/image/info8/da901f07045746d4b60382332d6f9834.jpg)
Stochastic Gradient Descent
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第7张图片](http://img.e-com-net.com/image/info8/c7d1c368916b4423b498e73eb048a5e8.jpg)
Stochastic gradients with word vectors!
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第8张图片](http://img.e-com-net.com/image/info8/88e12ec7c3f34b3688e295b2c16964ec.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第9张图片](http://img.e-com-net.com/image/info8/94e1f579c381456b9a239238f143e54b.jpg)
1b. Word2vec: More details
So far, we have looked at two main classes of methods to find word embeddings. The first set are count-based and rely on matrix factorization (e.g. LSA, HAL). While these methods effectively leverage global statistical information, they are primarily used to capture word similarities and do poorly on tasks such as word analogy, indicating a sub-optimal vector space structure. The other set of methods are shallow window-based (e.g. the skip-gram and the CBOW models), which learn word embeddings by making predictions in local context windows. These models demonstrate the capacity to capture complex linguistic patterns beyond word similarity, but fail to make use of the global co-occurrence statistics.
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第10张图片](http://img.e-com-net.com/image/info8/d92167d5c9ab4ee0ad35e534514217ed.jpg)
The skip-gram model with negative sampling (HW2)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第11张图片](http://img.e-com-net.com/image/info8/96f5751ca06d467695b38a7dba8d222f.jpg)
In comparison, GloVe consists of a weighted least squares model that trains on global word-word co-occurrence counts and thus makes efficient use of statistics. The model produces a word vector space with meaningful sub-structure. It shows state-of-the-art performance on the word analogy task, and outperforms other current methods on several word similarity tasks.
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第12张图片](http://img.e-com-net.com/image/info8/55ca13d1f4fb4cb789f056b75e343437.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第13张图片](http://img.e-com-net.com/image/info8/4a3617c1ff60435b905f42096c690fcd.jpg)
3. But why not capture co-occurrence counts directly?
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第14张图片](http://img.e-com-net.com/image/info8/99e3cd56dd254e46b9cfc6f03be962c0.jpg)
Example: Window based co-occurrence matrix
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第15张图片](http://img.e-com-net.com/image/info8/dfa6463b6850469b9626ea7fe1eff1b0.jpg)
Window based co-occurrence matrix
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第16张图片](http://img.e-com-net.com/image/info8/e774d734076548cebcdf5a1d311cb820.jpg)
Problems with simple co-occurrence vectors
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第17张图片](http://img.e-com-net.com/image/info8/70f46cf0357a42d89a70e4418967f503.jpg)
Solution: Low dimensional vectors
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第18张图片](http://img.e-com-net.com/image/info8/58994ab5ae7f4d569579bf96538b3461.jpg)
Method 1: Dimensionality Reduction on X (HW1)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第19张图片](http://img.e-com-net.com/image/info8/031068c0f92d43cc90f02a1b6e02b4ee.jpg)
Simple SVD word vectors in Python
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第20张图片](http://img.e-com-net.com/image/info8/358986a10bcb4fd0889e6603c41b44cd.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第21张图片](http://img.e-com-net.com/image/info8/72a881d86f42455a9380d8e08522477f.jpg)
Hacks to X (several used in Rohde et al. 2005)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第22张图片](http://img.e-com-net.com/image/info8/e9a4f34c6fd942ba814d0b442151d63e.jpg)
Interesting syntactic patterns emerge in the vectors
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第23张图片](http://img.e-com-net.com/image/info8/858f04bca51b461bb03126a4025349bb.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第24张图片](http://img.e-com-net.com/image/info8/374467e95645469ba5f48006a182c31e.jpg)
Count based vs. direct prediction
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第25张图片](http://img.e-com-net.com/image/info8/b8e85f9e69d247e78783f2a90343f29e.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第26张图片](http://img.e-com-net.com/image/info8/ab2ccf19cd2a489fb2d151d9060f869e.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第27张图片](http://img.e-com-net.com/image/info8/ff60a242c5aa4257b066e48c5eec50f4.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第28张图片](http://img.e-com-net.com/image/info8/e5f8c6583d65470f86d08a2aac60ce8c.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第29张图片](http://img.e-com-net.com/image/info8/71c5e51845bf43a482cd4ff0c373f9bc.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第30张图片](http://img.e-com-net.com/image/info8/520ac41e23cd4f2ca7dea9a624e3f13f.jpg)
How to evaluate word vectors?
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第31张图片](http://img.e-com-net.com/image/info8/74a3738d440247bea4273869b569e14e.jpg)
Intrinsic word vector evaluation
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第32张图片](http://img.e-com-net.com/image/info8/198cb74ff2e149609ec313b5665b8c35.jpg)
Glove Visualizations
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第33张图片](http://img.e-com-net.com/image/info8/461ecd603dbf48418b2b1febddf9a78d.jpg)
Glove Visualizations: Company - CEO
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第34张图片](http://img.e-com-net.com/image/info8/a9f5c51705dc47128827e8b5e2543934.jpg)
Glove Visualizations: Superlatives
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第35张图片](http://img.e-com-net.com/image/info8/13685714cd134f57b904abc5847cb8d7.jpg)
Details of intrinsic word vector evaluation
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第36张图片](http://img.e-com-net.com/image/info8/dda6abcd9a6848a7b02e1c843ac9ec51.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第37张图片](http://img.e-com-net.com/image/info8/a1f03231f5cb4a40b19c8be3a86b3524.jpg)
Analogy evaluation and hyperparameters
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第38张图片](http://img.e-com-net.com/image/info8/b0b50e76ba8b483eb93a97b34fa3d77a.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第39张图片](http://img.e-com-net.com/image/info8/e0c15756567247e29d23d84e894b5555.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第40张图片](http://img.e-com-net.com/image/info8/a3a80ecf79f44ec9be0d8d69f64617bc.jpg)
Analogy evaluation and hyperparameters
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第41张图片](http://img.e-com-net.com/image/info8/5c5eff4672534fa1a49de014600561d1.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第42张图片](http://img.e-com-net.com/image/info8/d765777c2aa7436baec398f6f991db78.jpg)
Another intrinsic word vector evaluation
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第43张图片](http://img.e-com-net.com/image/info8/27132c76409543dba105a9e95d23cadd.jpg)
Closest words to “Sweden” (cosine similarity)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第44张图片](http://img.e-com-net.com/image/info8/28bb9b7ed6524278af249bad699f0c3a.jpg)
Correlation evaluation
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第45张图片](http://img.e-com-net.com/image/info8/a383bda0aaa741f998a51a2727600a93.jpg)
Word senses and word sense ambiguity
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第46张图片](http://img.e-com-net.com/image/info8/83a2345b8ba243afa44cb6a3856c5d83.jpg)
pike
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第47张图片](http://img.e-com-net.com/image/info8/04777b259c6e413480b17bf300de722b.jpg)
Improving Word Representations Via Global Context And Multiple Word Prototypes (Huang et al. 2012)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第48张图片](http://img.e-com-net.com/image/info8/37b9d5a5fc474eba9cab9d22a8959124.jpg)
Linear Algebraic Structure of Word Senses, with Applications to Polysemy
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第49张图片](http://img.e-com-net.com/image/info8/67bdd529637d42f3bba8fc0085574b5b.jpg)
Extrinsic word vector evaluation
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第50张图片](http://img.e-com-net.com/image/info8/e3dce5a561444c42b0d259638e62b241.jpg)
Course plan: coming weeks
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第51张图片](http://img.e-com-net.com/image/info8/03d8d62d0eae405c85ff0a39e0154ffb.jpg)
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第52张图片](http://img.e-com-net.com/image/info8/92b0f25f056742cebe1dce01d81c7c71.jpg)
Office Hours / Help sessions
![[cs224n] Lecture 2 – Word Vectors and Word Senses_第53张图片](http://img.e-com-net.com/image/info8/149e26eef6f847d0b15eb9e75161390a.jpg)