Assignment1的答案一共被我分成了4部分,分别包含第1,2,3,4题。这部分包含第2题的答案。
(a). (3 points) Derive the gradients of the sigmoid function and show that it can be rewritten as a function of the function value (i.e. in some expression where only σ(x) , but not x , is present). Assume that the input x is a scalar for this question. Recall, the sigmoid function is
解:
(b). (3 points) Derive the gradient with regard to the inputs of a softmax function when cross entropy loss is used for evaluation, i.e. fi nd the gradients with respect to the softmax input vector θ , when the prediction is made by y^=softmax(θ) . Remember the cross entropy function is
解:根据提示,假设 y 的第k个值为1,其余值都为0,即 yk=1 那么有:
(c). (6 points) Derive the gradients with respect to the inputs x to an one-hidden-layer neural network (that is, find ∂J∂x where J is the cost function for the neural network). The neural network employs sigmoid activation function for the hidden layer, and softmax for the output layer. Assume the one-hot label vector is y , and cross entropy cost is used. (feel free to use σ′(x) as the shorthand for sigmoid gradient, and feel free to define any variables whenever you see fit)
Recall that the forward propagation is as follows
解:设 y 的第k个值为1,其余值都为0,即 yk=1 那么有:
记 xW1+b1=θ1 ,即 h=σ(θ1) ,记 θ1 的第 i 个元素为 θ(1)i , W1 的第 i 行第 j 列个元素为 W(1)ij 那么有:
(d). (2 points) How many parameters are there in this neural network, assuming the input is Dx -dimensional, the output is Dy -dimensional, and there are H hidden units?
解: W1 的维度是 Dx×H , b1 的维度是 1×H , W2 的维度是 H×Dy , b2 的维度是 1×Dy 。所以一共有 DxH+H+DyH+Dy 个参数。
(e)(f)(g). 见代码,略