Model, Data, & Likelihood

Let’s build a simple MCMC sampler using the Metropolis algorithm. First, we’ll set up our model. We’ll suppose the data (\(z\)) comes from a normal distribution, with a known standard deviation \(\sigma = 1\): \[ z \sim \mathcal{N}(\mu, 1) \] Then our the likelihood for a single data point \(z_i\) is given by: \[ P(z_i \,|\, \mu, 1) = f(z_i \,|\, \mu, 1) = \frac{1}{\sqrt{2\pi}}e^{-\frac{(z - \mu)^2}{2}}\] and our overall likelihood for a full data set \(z = \{z_1,\dots, z_n\}\) is given by: \[ P(z \,|\, \mu) = \prod_{i=1}^n f(z_i \,|\, \mu, 1) \]

likelihood = function(data,mu){prod(dnorm(data,mu,1))}

From the above model, let’s simulate a little data set that we can use for parameter estimation:

data = rnorm(20,2,1)

Prior Distribution

Now, we need a prior for our unknown parameter \(\mu\). We’ll choose a normal distribution for the prior of \(\mu\). But you can also try other priors (e.g. uniform) too! For our prior \(P(\mu)\), we’ll choose \(\mu \sim \mathcal{N}(1,1.5)\).

# Set up prior
prior = function(mu){dnorm(mu,1,1.5)}

# Plot the prior
plot(seq(0,5,0.1),prior(seq(0,5,0.1)),type='l',xlab="µ", ylab="density")

Setting up the Metropolis algorithm for MCMC

As we saw in class, the Metropolis algorithm for MCMC is:

  1. Initialization
    • Choose a starting point in parameter space, say \(\mu = 5\).
    • Decide on a proposal distribution to choose the next point to sample—this distribution is often somewhat arbitrary, but different choices will make it easier or harder to converge to the stationary distribution. In our case we’ll choose a normal distribution with mean given by the current sampled parameter value (call this \(\mu_{curr}\)), and standard deviation of 0.5, \(\mathcal{N}(\mu_{curr}, 0.5)\).
    • Decide how many samples you want to take—in our case, let’s take 5000.

We’ll also make a variable, samples, to hold our MCMC samples as we go.

mu.curr = 5 #start value for mu
proposaldist = function(paramval){rnorm(1,paramval,0.5)}
numsteps = 5000
samples = numeric(numsteps)
  1. For each iteration,
    • Use the proposal distribution to draw a new proposed sample, \(\mu_{next}\)

    • Calculate the product of the likelihood and the prior, \(P(z \,|\, \mu) \cdot P(\mu) \), for both the current and newly proposed values of \(\mu\). Note that \(P(z \,|\, \mu) \cdot P(\mu)\) is proportional to the posterior (it is just missing the denominator from Bayes’ theorem), so our sampler will recover the posterior distribution.

    • Calculate the acceptance ratio, \(a = \frac{P(z \,|\, \mu_{next}) \cdot P(\mu_{next})}{P(z \,|\, \mu_{curr}) \cdot P(\mu_{curr})}\).

    • If \(a\geq 1\), the new point is as good or better than the last, and we automatically accept.

    • If \(a<1\), then the new point \(\mu_{next}\) is worse. In this case, we accept the new point with probability \(a\). If it is accepted, we set mu.curr = If it is not accepted, we leave mu.curr as is and re-draw in the next iteration.

    • Also, don’t forget to record your mu.curr samples at each iteration!

Code the algorithm above and generate 5000 samples for \(\mu\). Here’s more explicit pseudocode for 2) to help you set things up:

Plotting the Results

Next, plot the chain—do you see burn-in? Roughly when does it converge to the stationary distribution?

plot(1:numsteps,samples, xlab='iteration',ylab='mu')

Lastly, if we plot the prior, likelihood, and posterior, we can see how the likelihood alters the prior to generate the posterior. We can also compare the sampled posterior matches the analytical posterior.

# take just the later part of the chain, after burn-in
posteriorsample = samples[500:numsteps]

# set up the prior & likelihood for line-plotting 
# (also re-scale them so they're on similar y-axis scales)

paramvalslist = seq(0,4,0.1) # parameter values list

#corresponding prior values
priorlist = prior(paramvalslist)
priorlist = priorlist/max(priorlist)

#likelihood values
likelihoodlist = numeric(length(paramvalslist))
for(i in 1:length(paramvalslist)){likelihoodlist[i] = likelihood(data,paramvalslist[i])}
likelihoodlist = likelihoodlist/max(likelihoodlist)

# This is the dataframe we'll use for ggplot
plotdata = data.frame(paramvalslist,priorlist,likelihoodlist)

ggplot(plotdata) + 
  geom_density(,aes(posteriorsample, color='Sampled Posterior'),size=1) +
  geom_line(aes(paramvalslist,likelihoodlist, color='Likelihood'),size=1) + 
  geom_line(aes(paramvalslist,priorlist, color='Prior'),size=1) + 
  labs(x="µ", y="scaled density") + guides(color = guide_legend('')) + 
  scale_colour_brewer(palette = "Set1")

Extra problems