1. difference between hidden variables and hyperparameter
2. procudre
step 1: the complete-data likelihood, given hyperparameter
p(w, z, theta, pi | alpha, beta)
step 2: the observed data likelihood, given hidden variables
p(w | theta, pi)
step 3: determine which hidden variable can be integrated out, i.e. collapsed out.
theta, pi can be integrated out, thus the gibbs sampler is for p(z|w)
step 4: apply bayesian methods for full conditional distribution p(z_i|, z_-i, w)
p(z_i| z_-i, w) = p(z,w)/{integrate z_i, p(z,w)}
step 5: based on the equation above, we need to calculate the joint distribution of p(z,w)