The Reversible jump Metropolis-Hastings algorithm

Motivation for RJMCMC

Green (1995) showd that the detailed balance equation can also be written in general state space. This allows one to propose a models of different dimensionality and therby incorporate model selection into the probabilistic simulation itelf. To do this, one defines a transition probablity to and from each subspace.

The state space $S$ for the reversible jump Metropolis-Hastings algorithm may be written as

\[ S = \left\{(n, \theta^{(n)}), k\in{\cal S}, \theta^{(n)}\in \Theta^n\right\} \]

where $\theta^{(n)}\in\Theta^n$ is the parameter space of model $n$ and ${\cal S}$ is enumerable model space. Clearly, in this notation, the dimension of $\theta^{(n)}$ varies with $n$.

In most cases, posterior distribution may be naturally written as the product of a conditional probability for subspace $(n)$ and the occupation fraction in that subspace:

\[ \pi(\theta^{(n)} , k) = \pi(\theta^{(n)}|n)\pi(n) . \]

Presumably, we are interested in the posterior probabilities of different models $\pi(n)$ of draw some conditional or marginal conclusions about different models in terms of $\pi(\theta^{(n)}|n)$ or $\pi(\theta^{(n)})$. Such estimates follow directly from the simulated posterior. For example, an exstimate of $\pi(n)$ comes directly occupation frequency in each subspace.

Implementing RJMCMC

Following Green (1995), we must define reversible transitions between models in different subspaces, say $i$ and $j$. This is accomplished by proposing a bijective function $g_{ij}$ that transforms the parameters

\[ g_{ij}(\theta^{(i)}, \nu^{(i)} ) = (\theta^{( j )} , \nu^{( j )}), \]

and retains the dimensions

\[ d(\theta^{(i)}) + d(\nu^{(i)}) = d(\theta^{(j)}) + d(\nu^{(j)}). \]

The parameter vector $\nu$ is a random quantity used in proposing change in the components and as extra components when going to a higher dimension. The rank of $\nu$ may be zero. For example, if $d(\theta^{(i)})=2$ and $d(\theta^{(j)})=1$ then we may define $d(\nu^{(i)})=0$ and $d(\nu^{(j)})=1$. The inverse transform $g_{ij}^{-1} = g_{jk}$ defines the transition in the opposite direction.

If $q_{ij}(\theta^{(i)}, \nu^{(i)})$ is the probability density for the proposed transition and $p(i, j)$ is the probability for the move $i\rightarrow j$, the accepting probability can be written as

\[ \alpha_{ij}(\theta^{(i)}, \theta^{(j)}) = \min\left\{ 1, {\pi_j(\theta^{(j)})p(j,i)q_{ji}(\theta^{(j)},\nu^{(j)}) \over \pi_i(\theta^{(i)})p(i,j)q_{ij}(\theta^{(i)},\nu^{(i)})} \left|{\partial(\theta^{(j)},\nu^{(j)}) \over \partial(\theta^{(i)},\nu^{(i)})}\right| \right\}. \]

As usual, the probabilities densities $p(i, j)$ and $q_{ij}$ are selected based on prior knowledge of the problem and to optimize the overall rate of convergence.

The RJMCMC Algorithm

Assume that at the ith iteration the current state is in the subspace $n_i$ with parameter vector $\theta_i^{(n_i)}$. We propose a new state as follows:
  1. Choose a new model $j$ by drawing it from distribution $p(n_i,\cdot)$. Propose a value for the parameter $\theta^{(j)}$ by sampling $^{(n_i)}$ from distribution $q_{n_ij}(\theta_i^{(n_i)} , u^{(n_i)})$.
  2. Accept the move with probability $\alpha_{n_ij}(\theta^{(n_i)}, \theta^{(j)})$ with $n_{i+1} = j$ and $\theta_{i+1}^{(n_{i+1})} = \theta^{(j)}$.
  3. If the move is not accepted, stay in the current subsapce : $n_{i+1} = n_i$ and $\theta_{i+1}^{(n_{i+1})} = \theta_i^{(n_i)}$.
In practice, one usually chooses to interleave RJMCMC steps with some number of standard Metropolis-Hastings step to improve the mixing in each subspace.

An example

Consider two models, with with two real parameters:

\[ Model 1: (\theta_1, \theta_2) \in{\cal R}^2, \]

and one with one real parameter:

\[ Model 2: \theta\in{\cal R}. \]

Let us assume that the prior probability of transition between the models is $p(1,2)=p(2,1)=1/2$. Further more let us adopt a transformation between the subspaces of the following form:

\[ g_{12}(\theta_1, \theta_2) = \left({\theta_1+\theta_2\over 2}, {\theta_1=\theta_2\over 2}\right) = (\theta, \nu) \]

where $\nu$ is distributed as $q(\nu)$. Therefore, given $(\theta_1, \theta_2)$, we draw $\nu$ from $q(\nu)$ and solve to get $\theta$. The inverse transformation is:

\[ g_{21}(\theta, u) = (\theta + \nu, \theta - \nu). \]

Therefore, given $\theta$, we draw $\nu$ from $q(\nu)$ and immediately obtain $(\theta_1, \theta_2)$.

The acceptance probability for $(\theta_1, \theta_2)\rightarrow\theta$ is then:

\[ \alpha_{12} = {\pi_2(\theta){1\over 2}q(\nu)\over \pi_1(\theta_1,\theta_2)}{1\over2} \left|{\partial(\theta, \nu)\over\partial(\theta_1,\theta_2)}\right| = {\pi_2\left({\theta_1+\theta_2\over2}\right) q({\theta_1-\theta_2\over2}) \over 2\pi_1(\theta_1, \theta_2)}. \]

and for $\theta\rightarrow(\theta_1, \theta_2)$:

\[ \alpha_{21} = {2\pi_1(\theta + \nu, \theta - \nu)\over \pi_2(\theta)q(\nu)}. \]

In practice, therefore, the ratio $\pi_1/\pi_2$ must be relatively normalized.

References

Green, P. J., 1995 Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination. Biometrika 82, 711–713.

Send suggestions, questions, and feedback to WEINBERG at ASTRO dot UMASS dot EDU.
Documentation generated at Fri Mar 26 00:35:11 2010 by doxygen