深度学习Course5第二周Natural Language Processing & Word Embeddings习题整理

Natural Language Processing & Word Embeddings

  1. True/False: Suppose you learn a word embedding for a vocabulary of 20000 words. Then the embedding vectors could be 1000 dimensional, so as to capture the full range of variation and meaning in those words.
  • False
  • True

解析:The dimension of word vectors is usually smaller than the size of the vocabulary. Most common sizes for word vectors range between 50 and 1000.

  1. True/False: t-SNE is a linear transformation that allows us to solve analogies on word vectors.
  • False
  • True

解析:tr-SNE is a non-linear dimensionality reduction technique.

  1. Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set.
    深度学习Course5第二周Natural Language Processing & Word Embeddings习题整理_第1张图片
    Then even if the word “ecstatic” does not appear in your small training set, your RNN might reasonably be expected to recognize “I’m ecstatic” as deserving a label y = 1 y=1 y=1.
  • False
  • True

解析: word vectors empower your model with an incredible ability to generalize. The vector for “ecstatic” would contain a positive/happy connotation which will probably make your model classify the sentence as a “1”.(泛化能力增强)

  1. Which of these equations do you think should hold for a good word embedding? (Check all that apply)
  • e m a n − e u n c l e ≈ e w o m a n − e a u n t e_{man}−e_{uncle}≈e_{woman}−e_{aunt} emaneuncleewomaneaunt
  • e m a n − e w o m a n ≈ e u n c l e − e a u n t e_{man}−e_{woman}≈e_{uncle}−e_{aunt} emanewomaneuncleeaunt
  • e m a n − e w o m a n ≈ e a u n t − e u n c l e e_{man}−e_{woman}≈e_{aunt}−e_{uncle} emanewomaneaunteuncle
  • e m a n − e a u n t ≈ e w o m a n − e u n c l e e_{man}−e_{aunt}≈e_{woman}−e_{uncle} emaneauntewomaneuncle
  1. Let A A A be an embedding matrix, and let o 4567 o_{4567} o4567 be a one-hot vector corresponding to word 4567. Then to get the embedding of word 4567, why don’t we call A ∗ o 4567 A * o_{4567} Ao4567 in Python?
  • It is computationally wasteful.
  • The correct formula is A T ∗ o 4567 A^T∗o_{4567} ATo4567
  • None of the answers are correct: calling the Python snippet as described above is fine.
  • This doesn’t handle unknown words ().

解析:the element-wise multiplication will be extremely inefficient.

  1. When learning word embeddings, words are automatically generated along with the surrounding words.
  • True
  • False

解析: we pick a given word and try to predict its surrounding words or vice versa.

  1. In the word2vec algorithm, you estimate P ( t ∣ c ) P(t \mid c) P(tc), where t t t is the target word and c c c is a context word. How are t t t and c c c chosen from the training set? Pick the best answer.
  • c c c is the sequence of all the words in the sentence before t t t
  • c c c and t t t are chosen to be nearby words.
  • c c c is a sequence of several words immediately before t t t
  • c c c is the one word that comes immediately before t t t
  1. Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The word2vec model uses the following softmax function:
    在这里插入图片描述
    Which of these statements are correct? Check all that apply.
  • θ t θ_t θt and e c e_c ec are both trained with an optimization algorithm such as Adam or gradient descent.
  • θ t θ_t θt and e c e_c ec are both 500 dimensional vectors.
  • After training, we should expect θ t θ_t θt to be very close to e c e_c ec when t t t and c c c are the same word.
  • θ t θ_t θt and e c e_c ec are both 10000 dimensional vectors.
  1. Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The GloVe model minimizes this objective:
    在这里插入图片描述
    Which of these statements are correct? Check all that apply.
  • θ i θ_i θi and e j e_j ej should be initialized to 0 at the beginning of training.
  • θ i θ_i θi and e j e_j ej should be initialized randomly at the beginning of training.
  • X i j X_ij Xij is the number of times word j appears in the context of word i.
  • Theoretically, the weighting function f ( . ) f(.) f(.) must satisfy f ( 0 ) = 0 f(0)=0 f(0)=0
  1. You have trained word embeddings using a text dataset of t 1 t_1 t1 words. You are considering using these word embeddings for a language task, for which you have a separate labeled dataset of t 2 t_2 t2 words. Keeping in mind that using word embeddings is a form of transfer learning, under which of these circumstances would you expect the word embeddings to be helpful?
  • When t 1 t_1 t1 is equal to t 2 t_2 t2
  • When t 1 t_1 t1 is smaller than t 2 t_2 t2
  • When t 1 t_1 t1 is larger than t 2 t_2 t2

解析:Transfer embeddings to new tasks with smaller training sets.

你可能感兴趣的:(deeplearning_ai,深度学习,python,人工智能)