Using Interval Estimation to set prior value in Online Learning method

The below method is a common way to compute ctr:

                             ctr = (baseClickNumber + clickNumber)/(baseShowNumber + showNumber)

If you know history(yesterday or last month) ctr, how can you set baseShowNumber to ensure ctr is stable and time-sensitive?

According to Central Limit Theory, ctr obeys normal distribution N(ctr, sqrt(ctr(1-ctr))), so we can use interval estimation to decide how great of baseShowNumber is proper.

We set the ctr we got is ctr, and the true ctr is p, we obtain:

                               (ctr - p)/sqrt(p(1-p)/n) ~ N(0, 1)

You can solve the interval estimation by your own effort. I only give the R implementation for simplicity (also for the formula is so complex to edit : ) 

#args n: sample size, p:computed ctr, prob: required reliability
#return interval boundary
ctr_interval_estimate <- function(n, p, prob) { 
   z <- qnorm(1 - (1 - prob)/2) 
   a1 <- n/(n + z*z)    
   a2 <- z*z/(2*n)    
   a3 <- z * sqrt(p * (1-p)/n + z * z/(4 * n * n))    
   high <- a1 * ( p + a2 + a3)    
   low <- a1 * (p + a2 - a3)    
   return(data.frame(low = low, high = high))    
}   
 
 > ctr_interval_estimate((1:10) * 10000,  1/1000, 0.9)  
            low        high
1  0.0005979155 0.001672025
2  0.0006937575 0.001441231
3  0.0007414997 0.001348497
4  0.0007716322 0.001295867
5  0.0007929425 0.001261057
6  0.0008090721 0.001235928
7  0.0008218462 0.001216726
8  0.0008322971 0.001201453
9  0.0008410588 0.001188942
10 0.0008485460 0.001178455 


你可能感兴趣的:(interval,estimation)