In statistics, the method of maximum likelihood, pioneered by geneticist and statistician Sir Ronald A. Fisher, is a method of point estimation that estimates an unobservable population with parameter(s) that maximizes the likelihood function.
For the moment, let
denote the unobservable population parameter(s) to be estimated from the probability density function (pdf)
.
Let
denote the random variable observed (which, in general, will be a random vector instead).
Then the likelihood function is when the pdf is considered a function of the parameter(s)
.
The maximum likelihood estimator maximizes the likelihood function
or the log-likelihood function
as the log-likelihood function may be easier to maximize than just the likelihood function.
(The log-likelihood is closely related to information entropy and Fisher information.)
This maximum can be found with calculus (setting the first derivative to zero) or by using non-linear optimization techniques for more complex likelihood functions.
Maximum-likelihood estimators are sometimes better than unbiased estimators.
They also have a property called "functional invariance" that unbiased estimators lack: for any injective function f, the maximum-likelihood estimator of f(θ) is f(T), where T is the maximum-likelihood estimator of θ.
However, the bias of maximum-likelihood estimators can be substantial.
Consider a case where n tickets numbered from 1 through to n are placed in a box and one is selected at random (see uniform distribution). If n is unknown, then the maximum-likelihood estimator of n is the value on the drawn ticket, even though the expectation is only
; we can only be certain that n is greater than or equal to the drawn ticket number.
Invariance principle/property
If
is the maximum likelihood estimator for θ, then the ML estimator for α = g(θ) (if the function g(θ) is a one to one function) is
.
An example: estimating the parameter of a binominal distribution
In a large population of voters, the proportion p who will vote "yes" is unobservable, and is to be estimated based on a political opinion poll.
A sample of 10 (n) voters is chosen randomly, and it is observed that 3 (k) of those n voters will vote "yes". Then the likelihood function (based on the binomial distribution in this case) is:
Graphing this equation for different values of p, you can see that the likelihood is maximized near p = 0.3.
We can use the first derivative of the logarithm of the likelihood function (with respect to p) and set it to zero to analytically find the maximum:
By solving for p, we will obtain
as the maximum-likelihood estimate, and
for the example numbers above.