In mathematics, rejection sampling is a basic pseudo-random number sampling technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or accept-reject algorithm.


It is a classical sampling method which allows one to sample from a distribution which is difficult or impossible to simulate by an inverse transformation. Instead, draws are taken from an instrumental density and accepted with a carefully chosen probability. The resulting draw is a draw from the target density.


Rejection sampling is based on the observation that to sample a random variable one can sample uniformly from the region under the graph of its density function.


Rejection sampling works as follows:

  1. Sample a point (an x-position) from the proposal distribution.
  2. Draw a vertical line at this x-position, up to the curve of the proposal distribution.
  3. Sample uniformly along this line (i.e. uniformly from 0 to the value of the proposal distribution), and reject points outside of the desired distribution (i.e. greater than the value of the desired distribution at this point).


Algorithm


The objective is to sample from a target density π(x) = f(x) / C.

- xRd

- f(x): the unnormalized target density, f(x)π(x)

- C: the potentially unknown normalizing constant


Suppose that we can sample from another density h(x) - 위 그림에서 Q(x) - and that there exists a constant k such that f(x)kh(x) for all x. To obtain a draw from π:

  1. Draw a candidate z from h(x) and u from U[0,1], the Uniform distribution on the interval [0,1].
  2. If uf(z) / kh(z), return z.
  3. Otherwise, return to step 1.

The expected number of iterations required to accept a draw is k−1.

To ensure efficiency, the optimal choice of k is



※ sup는 supremum(상한)이라는 뜻임


Example

Suppose it is desired to generate a random point within the unit circle. Generate a candidate point (x,y) where x and y are independent uniformly distributed between −1 and 1. If it so happens that x2 + y2 ≤ 1 then the point is within the unit circle and should be accepted. If not then this point should be rejected and another candidate should be generated.



Drawbacks

Rejection sampling can lead to a lot of unwanted samples being taken if the function being sampled is highly concentrated in a certain region, for example a function that has a spike at some location. For many distributions, this problem can be solved using an adaptive extension (see adaptive rejection sampling). In addition, as the dimensions of the problem get larger, the ratio of the embedded volume to the "corners" of the embedding volume tends towards zero, thus a lot of rejections can take place before a useful sample is generated, thus making the algorithm inefficient and impractical. See curse of dimensionality. In high dimensions, it is necessary to use a different approach, typically a Markov Chain Monte Carlo (MCMC) method such as Metropolis sampling or Gibbs sampling. (However, Gibbs sampling, which breaks down a multi-dimensional sampling problem into a series of low-dimensional samples, may use rejection sampling as one of its steps.)


Adaptive rejection sampling

For many distributions, finding a proposal distribution that includes the given distribution without a lot of wasted space is difficult. An extension of rejection sampling that can be used to overcome this difficulty and efficiently sample from a large number of distributions (provided that they are log-concave, which is in fact the case for most of the common distributions) is known as adaptive rejection sampling. The basic idea is to model the proposal distribution using a set of piecewise exponential distributions (i.e. segments of one or more exponential distributions, attached end to end). This can be easier visualized in log space (i.e. by looking at the logarithm of the distribution). The logarithm of an exponential distribution is a straight line, and hence this method essentially involves enclosing the logarithm of the density in a series of line segments. This is the source of the log-concave restriction: if a distribution is log-concave, then its logarithm is concave (shaped like an upside-down U), meaning that a line segment tangent to the curve will always pass over the curve. The method essentially involves successively determining an envelope of straight-line segments that approximates the logarithm better and better while still remaining above the curve, starting with a fixed number of segments (possibly just a single tangent line). Any time we choose a point that is rejected, we tighten the envelope with another line segment that is tangent to the curve at the point with the same x-coordinate as the chosen point.


Lecture


Summary

복잡한 확률분포에 대해 샘플링을 할 수 있다.



References