library(ggplot2)
set.seed(2)
Let’s build a simple MCMC sampler using the Metropolis algorithm. First, we’ll set up our model. We’ll suppose the data (\(z\)) comes from a Poisson distribution, with an unknown rate \(\lambda\): \[ z \sim \mathcal{P}(\lambda) \] Then our the likelihood for a single data point \(z_i\) is given by: \[ P(z_i \,|\, \lambda) = f(z_i \,|\, \lambda) = \frac{\lambda^{z_i} e^{-\lambda}}{z_i!}\] and our overall likelihood for a full data set \(z = \{z_1,\dots, z_n\}\) is given by: \[ P(z \,|\, \lambda) = \prod_{i=1}^n f(z_i \,|\, \lambda) \]
likelihood = function(data,lambda){prod(dpois(data,lambda))}
From the above model, let’s simulate a little data set that we can use for parameter estimation:
data = rpois(20,2)
Now, we need a prior for our unknown parameter \(\mu\). We’ll choose a normal distribution for the prior of \(\mu\). But you can also try other priors (e.g. uniform) too! For our prior \(P(\mu)\), we’ll choose \(\mu \sim \mathcal{N}(1,1.5)\).
# Set up prior
prior = function(lambda){dnorm(lambda,1,1.5)}
# Plot the prior
plot(seq(0,5,0.1),prior(seq(0,5,0.1)),type='l',xlab="λ", ylab="density")
As we saw in class, the Metropolis algorithm for MCMC is:
We’ll also make a variable, samples
, to hold our MCMC samples as we go.
lambda.curr = 5 #start value for lambda
proposaldist = function(paramval){rnorm(1,paramval,0.5)}
numsteps = 5000
samples = numeric(numsteps)
Use the proposal distribution to draw a new proposed sample, \(\lambda_{next}\)
Calculate the product of the likelihood and the prior, \(P(z \,|\, \lambda) \cdot P(\lambda)\), for both the current and newly proposed values of \(\lambda\). Note that \(P(z \,|\, \lambda) \cdot P(\lambda)\) is proportional to the posterior (it is just missing the denominator from Bayes’ theorem), so our sampler will recover the posterior distribution.
Calculate the acceptance ratio, \(a = \frac{P(z \,|\, \lambda_{next}) \cdot P(\lambda_{next})}{P(z \,|\, \lambda_{curr}) \cdot P(\lambda_{curr})}\).
If \(a\geq 1\), the new point is as good or better than the last, and we automatically accept.
If \(a<1\), then the new point \(\lambda_{next}\) is worse. In this case, we accept the new point with probability \(a\). If it is accepted, we set lambda.curr = lambda.next
. If it is not accepted, we leave lambda.curr
as is and re-draw lambda.next
in the next iteration.
Also, don’t forget to record your lambda.curr
samples at each iteration!
Code the algorithm above and generate 5000 samples for \(\lambda\). Here’s more explicit pseudocode for 2) to help you set things up:
samples[i] = lambda.curr
(i.e. record the current value of \(\lambda\))lambda.next
from proposaldist
, using lambda.curr
lambda.curr
, e.g probcurr = likelihood(data,lambda.curr)*prior(lambda.curr)
lambda.next
(let’s call this probnext
)acceptratio
equal to probnext/probcurr
acceptratio >= 1
, accept lambda.next
—in other words, set lambda.curr = lambda.next
acceptratio < 1
, then accept lambda.next
with probability acceptratio
(i.e. if runif(1)<acceptratio
)Next, plot the chain—do you see burn-in? Roughly when does it converge to the stationary distribution?
plot(1:numsteps,samples, xlab='iteration',ylab='λ')
Lastly, if we plot the prior, likelihood, and posterior, we can see how the likelihood alters the prior to generate the posterior. We can also compare the sampled posterior matches the analytical posterior.
# take just the later part of the chain, after burn-in
posteriorsample = samples[500:numsteps]
# set up the prior & likelihood for line-plotting
# (also re-scale them so they're on similar y-axis scales)
paramvalslist = seq(0,4,0.1) # parameter values list
#corresponding prior values
priorlist = prior(paramvalslist)
priorlist = priorlist/max(priorlist)
#likelihood values
likelihoodlist = numeric(length(paramvalslist))
for(i in 1:length(paramvalslist)){likelihoodlist[i] = likelihood(data,paramvalslist[i])}
likelihoodlist = likelihoodlist/max(likelihoodlist)
# This is the dataframe we'll use for ggplot
plotdata = data.frame(paramvalslist,priorlist,likelihoodlist)
ggplot(plotdata) +
geom_density(data=as.data.frame(posteriorsample),aes(posteriorsample, color='Sampled Posterior'),size=1) +
geom_line(aes(paramvalslist,likelihoodlist, color='Likelihood'),size=1) +
geom_line(aes(paramvalslist,priorlist, color='Prior'),size=1) +
labs(x="µ", y="scaled density") + guides(color = guide_legend('')) +
scale_colour_brewer(palette = "Set1")
Try changing the standard deviation in the proposal distribution—how does it change the behavior of the chain? Try this for different numbers of iterations too—sometimes it’s easier to see the differences with fewer iterations (e.g. 500 or 1000).
Try changing your prior to something else, e.g. uniform on \((0,5)\). How does that change things? Try other priors too!
Try changing your data set to add more or fewer data points—how does that change the convergence of your chain?
Feeling ambitious and computational? Try coding up a 2-D sampler, e.g. using a normal distribution with a mean \(\mu\) and standard deviation \(\sigma\). If you do, try ploting your chain as it moves around in your 2D parameter space, and you can even show the posterior as a contour plot and see how it’s moving up and down the distribution!
Feeling ambitious and mathy? The gamma distribution is the conjugate prior for a Poisson likelihood—that means that for our model and prior, we should be able to derive a closed form normal posterior for \(\lambda\). Derive it, and plot it with your sampled posterior. How well do they match? How does this change as you change the number of iterations you run?