Multiresolution

Bayes' theorem gives the rule for updating belief in our hypothesis, $H$ (usually referred to as the probability of H) given new data $D$, and prior information $I$:

\[ P(H|D,I) = {P(H|I)P(D|H,I)\over P(D|I)} \]

The left-hand term, $P(H|D,I)$, the posterior probability, gives the probability of the hypothesis $H$ after considering the effect of data $D$. The $P(H|I)$ term is the prior probability of $H$ given $I$ alone; that is, the belief in $H$ before the data $D$ is introduced. The term $P(D|H,I)$ is called the likelihood, and it gives the probability of the evidence assuming the hypothesis $H$ and background information $I$ is true. The last term, $1/p(D|I)$, is independent of $H$, and can be regarded as a normalizing or scaling constant.

Bayes theorem is a simple consequence of the product rule of probabilities. The product rule gives the probability of the logical conjunction of two statements $A$ and $B$, written as $A$, $B$:

\[ P(A,B|I) = P(A|B,I)P(B|I) = P(B|A,I)P(A|I). \]

Bayes rule is derived by rearranging the terms in the above equality. Note that all of these probabilities are conditional. We require that the conditioning propositions include, at least implicitly, all of the information used to determine the probability of the conditioned propositions. In particular, $P(D|H,I)$ must include the error model for the data acquisition process (and this will be different for each process as noted in section above).

In our case, we have multiple layers of data. Denote the top layer as $D_0$ and subsequent layers which represent higher resolution (and therefore more information) as $D_j, j\in[1 \ldots N]$. Our aggregation step discussed above tells us that

\[ D_{i-1} = \sum D_i, \]

where the $\Sigma$ refers to the aggregation operation.

Using the product rule directly, we can extend Bayes theorem to our multiple sequential updates. Let's assume that we have evaluated the posterior at the top level:

\[ P(H|D_0,I) = {P(H|I)P(D_0|H,I)\over P(D_0|I)} \]

We now want to go down one level. Bayes theorem says

\[ P(H|D_0,D_1,I) = {P(H|I)P(D_0,D_1|H,I)\over P(D_0,D_1|I)}. \]

How do we get the new likelihood function in terms of known distributions? Using the product rule, we may write

\[ P(D_1,D_0|H,I) = P(D_1|D_0,H,I)P(D_0|H,I) \]

which on substituting into the expression for Bayes theorem above into the numerator and the analogous expression for the demoninator gives

\[ P(H|D_1,D_0,I) = {P(H|I)P(D_1|D_0,H,I)P(D_0|H,I)\over P(D_1|D_0,I)P(D_0|I)}. \]

Right away, we identify right hand side of the previous equation in the equation above; using this to simplify one finds:

\[ P(H|D_1,D_0,I) = {P(H|D_0,I)P(D_1|D_0,H,I)\over P(D_1|D_0,I)}. \]

We may simplify the likelihood function even further. Using the product rule once again, The second term in the numerator becomes

\[ P(D_1|D_0,H,I) = {P(D_0,D_1|H,I)\over P(D_0|H,I)} = {P(D_1|H,I)\over P(D_0|H,I)}, \]

where we have the fact that $D_0$ presents no new information given $D_1$ in the second equality. So finally we have

\[ P(H|D_1,D_0,I) = {P(H|D_0,I)\left[P(D_1|H,I)/P(D_0|H,I)\right] \over P(D_1|D_0,I)}. \]

Note the similarities with our original statement of Bayes theorem. The prior at Level 1 is now the posterior from the Level 0

Note:
{The prior will have to be sampled or characterized in a suitable manner to be practical in this computation. This characterization may mean an MCMC evaluation of the left-hand-side and subsequent fit of a multidimensional Gaussian.} and the likelihood ratio of Level 1 to Level 0. The likelihood ratio (in square brackets) now takes the place of the likelihood function in Bayes theorem. This makes good sense. For example, in the trivial case $D_0=D_1$, the likelihood ratio is 1 and no new inference really takes place.
This chain continues at to all levels. Using the same arguments, it is straightforward to argue by induction that:

\[ P(H|D_0,D_1,\ldots,D_n,I) = {P(H|D_0,\ldots,D_{n-1},I)\left[P(D_n|H,I)/P(D_{n-1}|H,I)\right] \over P(D_n|D_0,\ldots, D_{n-1},I)}. \]

As in the initial statement of Bayes theorem, the denominator on the right hand side is best interpreted as a normalization.

In short, for multiple spatial levels, the prior is the posterior at the previous (coarser) level and the likelihood becomes the ratio of likelihoods between the current to previous levels. Non-trivial application requires an {independent} estimate of the posterior simulation at each level.


Send suggestions, questions, and feedback to WEINBERG at ASTRO dot UMASS dot EDU.
Documentation generated at Fri Mar 26 00:35:11 2010 by doxygen