BAYESIAN LEARNING FOR NEURAL NETWORKS by Radford M. Neal

本文作者 Radford M. Neal 的 PhD Thesis. *(His writing style is like a physicist.)

Reading progress: 46/195


Main Contribution:


1ST PART

In chapter 2.1, Neal argues that under the condition of

1. Bayesian setting where we have prior, posterior

2. Two layer NN

3. Gaussian initialization of weights & bias (can be generalized)

4. Scale the variance inversely proportional to the square root of number of hidden units

Then for each dimension of the output:

1. Every dimension of output is independent. For any dimension:

2. the prior over the functions represented by the NN converges to a Gaussian Process of zero mean and constant variance (variance depends on the input).

3. The joint distribution converge to multivariate Gaussian with zero mean and interesting covariance.

Generalization: As long as the distribution is [independent & identical distribution; has zero mean; finite variance].


2nd PART

1. Tanh leads to smooth function prior

2. Step function {+1,-1} activation leads to locally Brownian prior.

你可能感兴趣的:(BAYESIAN LEARNING FOR NEURAL NETWORKS by Radford M. Neal)