[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)

3.1 Various sequence to sequence architectures

3.1.1 Basic Models

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第1张图片

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第2张图片


3.1.2  Picking the most likely sentence

conditional probability

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第3张图片

pick most likely sentence

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第4张图片

Greedy search(not useful)


3.1.3 Beam Search

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第5张图片

example

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第6张图片


3.1.4  Refinements to Beam Search

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第7张图片

3.1.5 Error analysis in beam search

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第8张图片


[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第9张图片

3.1.6 Attention model

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第10张图片

3.2 Speech recognition-Audio data

[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)_第11张图片


Q&A

9.B

10. A

1. Question 1

Consider using this encoder-decoder model for machine translation.

This model is a “conditional language model” in the sense that the encoder portion (shown in green) is modeling the probability of the input sentence x.

True

Question 2

2. Question 2

In beam search, if you increase the beam width B, which of the following would you expect to be true? Check all that apply.

Question 3

3. Question 3

In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations.

False

Question 4

4. Question 4

Suppose you are building a speech recognition system, which uses an RNN model to map from audio clip x to a text transcript y. Your algorithm uses beam search to try to find the value of ythat maximizes P(yx).

On a dev set example, given an input audio clip, your algorithm outputs the transcript y^= “I’m building an A Eye system in Silly con Valley.”, whereas a human gives a much superior transcript y= “I’m building an AI system in Silicon Valley.”

According to your model,

P(y^x)=1.09107

P(yx)=7.21108

Would you expect increasing the beam width B to help correct this example?

No, because P(yx)P(y^x) indicates the error should be attributed to the search algorithm rather than to the RNN.

Yes, because P(yx)P(y^x) indicates the error should be attributed to the RNN rather than to the search algorithm.

Yes, because P(yx)P(y^x) indicates the error should be attributed to the search algorithm rather than to the RNN.

Question 5

5. Question 5

Continuing the example from Q4, suppose you work on your algorithm for a few more weeks, and now find that for the vast majority of examples on which your algorithm makes a mistake, P(yx)>P(y^x). This suggest you should focus your attention on improving the search algorithm.

False.

Question 6

6. Question 6

Consider the attention model for machine translation.

Further, here is the formula for α<t,t>.

Which of the following statements about α<t,t> are true? Check all that apply.

Question 7

7. Question 7

The network learns where to “pay attention” by learning the values e<t,t>, which are computed using a small neural network:

We can't replace s<t1> with s<t> as an input to this neural network. This is because s<t>depends on α<t,t> which in turn depends on e<t,t>; so at the time we need to evalute this network, we haven’t computed s<t> yet.

False

Question 8

8. Question 8

Compared to the encoder-decoder model shown in Question 1 of this quiz (which does not use an attention mechanism), we expect the attention model to have the greatest advantage when:

The input sequence length Tx is small.

Question 9

9. Question 9

Under the CTC model, identical repeated characters not separated by the “blank” character (_) are collapsed. Under the CTC model, what does the following string collapse to?

__c_oo_o_kk___b_ooooo__oo__kkk

cookbook

cook book

coookkboooooookkk

Question 10

10. Question 10

In trigger word detection, x<t> is:

Features of the audio (such as spectrogram features) at time t.

The t-th input word, represented as either a one-hot vector or a word embedding.

Whether someone has just finished saying the trigger word at time t.




你可能感兴趣的:(machine,learning,deep,learning)