机器学习中的神经网络Neural Networks for Machine Learning:Lecture 13 Quiz

Lecture 13 QuizHelp Center

Warning: The hard deadline has passed. You can attempt it, but you will not get credit for it. You are welcome to try it as a learning exercise.

Announcement, added on Thursday, November 15 2012, 18:28 UTC. You may find this explanation of SBNs helpful.

This quiz is going to take you through the details of Sigmoid Belief Networks (SBNs). The most relevant videos are the second video ("Belief Nets", especially from 11:44) and third video ("Learning sigmoid belief nets") of lecture 13.

We'll be working with this network:
机器学习中的神经网络Neural Networks for Machine Learning:Lecture 13 Quiz_第1张图片

The network has no biases (or equivalently, the biases are always zero), so it has only two parameters:  w1 (the weight on the connection from  h1 to  v) and  w2 (the weight on the connection from  h2 to  v).

Remember, the units in an SBN are all binary, and the logistic function (also known as the  sigmoid function) figures prominently in the definition of SBNs. These binary units, with their logistic/sigmoid probability function, are in a sense the  stochastic equivalent of the  deterministic logistic hidden units that we've seen often in earlier lectures.

Let's start with  w1=6.90675478 and  w2=0.40546511. These numbers were chosen to ensure that the answer to many questions is a very simple answer, which might make it easier to understand more of what's going on. Let's also pick a complete configuration to focus on:  h1=0,h2=1,v=1 (we'll call that configuration  C011).
In accordance with the Coursera Honor Code, I (刘欣欣) certify that the answers here are my own work.

Question 1

What is  P(v=1|h1=0,h2=1)? Write your answer with four digits after the decimal point. Hint: the last three of those four digits are zeros. (If you're lost on this question, then I strongly recommend that you do whatever you need to do to figure it out, before proceeding with the rest of this quiz.)
Answer for Question 1

Question 2

What is the probability of that full configuration, i.e.  P(h1=0,h2=1,v=1), which we called  P(C011)? Write your answer with four digits after the decimal point. Hint: it's less than a half, and the last two of those four digits are zeros.
Answer for Question 2

Question 3

Now let's talk about the gradient that we need for learning, i.e.  logP(C011)wi. There are two ways you can try to answer these questions, and I recommend that you do both and verify that the answer comes out the same way. The first way is to take the derivative yourself. The second one is to use the learning rule that was mentioned in the lecture.
What is  logP(C011)w1? Write your answer with at least three digits after the decimal point, and don't be too surprised if it's a very simple answer.
Answer for Question 3

Question 4

What is  logP(C011)w2? Write your answer with at least three digits after the decimal point, and don't be too surprised if it's a very simple answer.
Answer for Question 4

Question 5

As was explained in the lectures, the log likelihood gradient for a full configuration is just one part of the learning. The more difficult part is to get a handle on the posterior probability distribution over full configurations, given the state of the visible units. Explaining away is an important issue there. Let's explore it with new weights: for the remainder of this quiz,  w1=10, and  w2=4.
What is  P(h2=1|v=1,h1=0)? Give your answer with at least four digits after the decimal point. Hint: it's a fairly small number (and not a round number like for the earlier questions); try to intuitively understand why it's small. Second hint: you might find Bayes' rule useful, but even with that rule, this still requires some thought.
Answer for Question 5

Question 6

What is  P(h2=1|v=1,h1=1)? Give your answer with at least four digits after the decimal point. Hint: it's quite different from the answer to the previous question; try to understand why. The fact that those two are different shows that, conditional on the state of the visible units, the hidden units have a strong effect on each other, i.e. they're not independent. That is what we call explaining away, and the earthquake vs. truck network is another example of that.

你可能感兴趣的:(机器学习)