BIGpedia.com - Maximum likelihood - Encyclopedia and Dictionary Online
encyclopedia search

Maximum likelihood

In statistics, the method of maximum likelihood, pioneered by geneticist and statistician Sir Ronald A. Fisher, is a method of point estimation that estimates an unobservable population with parameter(s) that maximizes the likelihood function.

For the moment, let \mathbf{\theta} denote the unobservable population parameter(s) to be estimated from the probability density function (pdf) p(\mathbf{x} \mid \mathbf{\theta}). Let \mathbf{X} denote the random variable observed (which, in general, will be a random vector instead). Then the likelihood function is when the pdf is considered a function of the parameter(s) \mathbf{\theta}. The maximum likelihood estimator maximizes the likelihood function

\hat{\mathbf{\theta}} = \operatorname{argmax}_\theta\ p(\mathbf{x} \mid \mathbf{\theta} )\,

or the log-likelihood function

\hat{\mathbf{\theta}} = \operatorname{argmax}_\theta\, \ln (p(\mathbf{x} \mid \mathbf{\theta} ))\,

as the log-likelihood function may be easier to maximize than just the likelihood function. (The log-likelihood is closely related to information entropy and Fisher information.)

This maximum can be found with calculus (setting the first derivative to zero) or by using non-linear optimization techniques for more complex likelihood functions.

Maximum-likelihood estimators are sometimes better than unbiased estimators. They also have a property called "functional invariance" that unbiased estimators lack: for any injective function f, the maximum-likelihood estimator of f(θ) is f(T), where T is the maximum-likelihood estimator of θ.

However, the bias of maximum-likelihood estimators can be substantial. Consider a case where n tickets numbered from 1 through to n are placed in a box and one is selected at random (see uniform distribution). If n is unknown, then the maximum-likelihood estimator of n is the value on the drawn ticket, even though the expectation is only n+1 \over2; we can only be certain that n is greater than or equal to the drawn ticket number.

Invariance principle/property

If \hat{\theta} is the maximum likelihood estimator for θ, then the ML estimator for α = g(θ) (if the function g(θ) is a one to one function) is \hat{\alpha} = g(\hat{\theta}).

An example: estimating the parameter of a binominal distribution

In a large population of voters, the proportion p who will vote "yes" is unobservable, and is to be estimated based on a political opinion poll. A sample of 10 (n) voters is chosen randomly, and it is observed that 3 (k) of those n voters will vote "yes". Then the likelihood function (based on the binomial distribution in this case) is:

L(p)={n \choose k} p^k (1-p)^{n-k}.

Graphing this equation for different values of p, you can see that the likelihood is maximized near p = 0.3.


We can use the first derivative of the logarithm of the likelihood function (with respect to p) and set it to zero to analytically find the maximum:

\frac{d \ln(L(p))}{dp} = \frac{k}{p}-\frac{n-k}{1-p}=0

By solving for p, we will obtain k \over n as the maximum-likelihood estimate, and p = \frac{3}{10} = 0.3 for the example numbers above.

\frac{k}{p}-\frac{n-k}{1-p}=0\,
(1-p)k - p(n-k) = 0\,
k - pk - p(n-k) = 0\,
k - pn = 0\,
p = \frac{k}{n}\,



The contents of this article are licensed from Wikipedia.org under the GNU Free Documentation License.
How to see transparent copy

01-04-2007 01:21:04