In statistics and in statistical physics, the Metropolis–Hastings algorithm is a Markov Chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult. This sequence can be used to approximate the distribution (i.e., to generate a histogram), or to compute an integral (such as an expected value). Metropolis–Hastings and other MCMC algorithms are generally used for sampling from multi-dimensional distributions, especially when the number of dimensions is high. For single-dimensional distributions, other methods are usually available (e.g. adaptive rejection sampling) that can directly return independent samples from the distribution, and are free from the problem of auto-correlated samples that is inherent in MCMC methods.


Detailed Balance

(http://sens.tistory.com/415)



The Metropolis-Hastings algorithm generates a p(x,y) with this property.


Methodology

Let q(x,y) denote the candidate-generating density, where


.

(특정 상태 x에서 다른 상태 y로 상태전이(xy)할 확률들을 다 더하면 1이다. 당연한 소리!)


This is essentially the conditional density of y given x ― Pr(y|x). This density could potentially be used as the p(x,y) term in the transition kernel, but it may not satisfy detailed balance.

If, for example, we have


,

(아직 stationary distribution에 도달하지 못함)


then we can adjust q by using a probability of move α(x,y). Transitions will be made using


.

(Metropolis-Hastings 알고리즘에서 재정의된 상태전이확률: α값으로 보정된 상태전이확률)


Proposal distribution: q(x,y)

Acceptance probability: α(x,y)


At each state x, sample y from q(x,y).

Accept proposal with probability α(x,y)

  • If proposal accepted, move to the state y
  • Otherwise stay at the state x


The choice of α follows the following logic. If the above inequality holds, then moves from x to y are happening too often under q. We should thus choose α(y,x) = 1. But then, in order to satisfy detailed balance, we must have



This implies that



Conversely, we can consider the case when the inequality(above) were reversed to derive α(y,x). Thus, to summarize, in order to satisfy reversibility, we set


.



Lecture



References