Assume that we want to estimate an unobserved population parameter θ on the basis of observations x. Let f be the sampling distribution of x, so that f(x|θ) is the probability of x when the underlying population parameter is θ. Then the function:
is known as the likelihood function and the estimate:
is the maximum likelihood estimate of θ.
Now assume that a prior distribution g over θ exists. This allows us to treat θ as a random variable as in Bayesian statistics. Then the posterior distribution of θ is as follows:
where g is density function of θ, Θ is the domain of g. This is a straightforward application of Bayes' theorem. The method of maximum a posterior estimation then estimates θ as the mode of the posterior distribution of this random variable:
The denominator of the posterior distribution (so-called partition function) does not depend on θ and therefore plays no role in the optimization.
Computation
MAP estimates can be computed in several ways:
- Analytically, when the mode(s) of the posterior distribution can be given in closed form. This is the case when conjugate priors are used.
- Via numerical optimization such as the conjugate gradient method or Newton's method. This usually requires first or second derivatives, which have to be evaluated analytically or numerically.
- Via a modification of an EM algorithm. This does not require derivatives of the posterior density.
- Via a Monte Carlo method using simulated annealing
References
http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation