BIGpedia.com - Birthday paradox - Encyclopedia and Dictionary Online
encyclopedia search

Birthday paradox

The birthday paradox states that if there are 23 people in a room then there is a slightly more than 50:50 chance that at least two of them will have the same birthday. For 60 or more people, the probability is greater than 99%. This is not a paradox in the sense of it leading to a logical contradiction; it is a paradox in the sense that it is a mathematical truth that contradicts common intuition. Most people estimate that the chance is much lower than 50:50. The mathematics behind this has been used to devise a well-known cryptographical attack named the birthday attack.

Contents

Estimating the probability

Calculating this probability (and related ones) is the birthday problem. It is commonly taught in elementary statistics and probability courses at universities. The theory behind it was described in the American Mathematical Monthly in 1938 in Zoe Emily Schnabel 's The estimation of the total fish population of a lake, under the name of capture-recapture statistics.

The key to understanding the birthday paradox is to realize that there are many possible pairs of people whose birthdays could match. Specifically, among 23 people, there are 23 × 22/2 = 253 pairs, each of which being a potential candidate for a match. To emphasize the point, consider a different scenario: if you enter a room with 22 people, the chance that somebody there has the same birthday as you is not 50:50, but much lower. This is because now there are only 22 possible pairs that could yield a match. The actual birthday problem is asking if any of the 23 people have matching birthdays between any of the others.

To compute the approximate probability that in a room of n people, at least two have the same birthday, we disregard distribution variations, such as leap years, twins, seasonal or weekday variations, and assume that 365 possible birthdays are equally likely. Real-life birthday distributions are not uniform since not all dates are equally likely.

The trick is to first calculate the probability that the n birthdays are different. This probability is given by

p = \frac{364}{365} \cdot \frac{363}{365} \cdot \frac{362}{365} \cdots \frac{365-n+1}{365}

because the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), etc. Using factorial notation, this expression can also be written as

p = { 365! \over 365^n (365-n)! }

for n ≤ 365, and 0 for n > 365.

Now, 1 − p is the probability that at least two persons have the same birthday. For n = 23, one obtains a probability of about 0.507. Some of these probabilities, numerically computed on Mathematica® using the formula above, are shown to state how the reality diverges from common sense, are below:

np
10 12%
20 41%
30 70%
50 97%
100 99.99996%
200 99.9999999999999999999999999998%
300 1 − (7 × 10−73)
350 1 − (3 × 10−131)
≥366 100%

Note that neither of the two people is chosen in advance: by way of contrast, the probability q(n) that someone in a room of n other people has the same birthday as a particular person (for example, you) is given by

1- \left( \frac{364}{365} \right)^n

which for n = 22 gives only about 0.059, or slightly better than 1 chance in 17. For a greater than 50:50 chance that one person in a roomful of people has the same birthday as you, n would need to be at least 253 .


The birthday paradox in its more generic sense applies to hash functions: the number of N-bit hashes that can be generated before probably getting a collision is not 2N, but rather only 2N/2. This is exploited by birthday attacks on cryptographic systems (see cryptographic hash function).

A mathematical (as opposed to numerical) view

In his autobiography, Paul Halmos deplored the fact that the birthday paradox is often presented in terms of mere numerical computation rather than conceptual mathematics. The argument given below is adapted from Halmos' argument, which contained a small error. In the product

\prod_{k=0}^{n-1}\left(1-{k \over 365}\right)

the first factor is equal to 1, and may therefore be discarded. Then we have

\sqrt[n-1]{\prod_{k=1}^{n-1}\left(1-{k \over 365}\right)} <{1 \over n-1}\sum_{k=1}^{n-1}\left(1-{k \over 365}\right)

because of the inequality of arithmetic and geometric means. Therefore:

\prod_{k=1}^{n-1}\left(1-{k \over 365}\right) <\left({1 \over n-1}\sum_{k=1}^{n-1}\left(1-{k \over 365}\right)\right)^{n-1}
=\left(1-{n \over 730}\right)^{n-1}<\left(e^{-n/730}\right)^{n-1}=e^{-(n^2-n)/730}.

The second inequality comes from 1 − x < e−x.

The value of the last expression is less than 1/2 if and only if

n^2-n>730\log_e 2\cong 505.997\dots

where "loge" means the natural logarithm. This falls just barely short of 506, and as luck would have it, 506 is one of the values of n2 − n, attained when n = 23.

Of this argument, Halmos wrote:

"The reasoning is based on important tools that all students of mathematics should have ready access to. The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation; the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer. What calculators do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories."

Generalization and approximation

The birthday paradox may be generalized: there are n persons, each one of them randomly chooses a number between 1 and fixed N (where N may be, e.g., 365 or any other integer > 0 ).

What is the probability p(n) that at least two persons choose the same number?

An approximation to the answer is given by

p(n)\sim 1-1/\exp(n^2/(2N)).\,

Results for n := 365


Reverse problem

An alternate question may be:

For fixed probability p ...
... find the greatest n(p) for which the probability p(n) is smaller than the given p, or
... find the smallest n(p) for which the probability p(n) is greater than the given p.

An approximation for this is given by:

n(p)\sim \left(2n\ln\left({1 \over 1-p}\right)\right)^{1/2}.

Example

approximationcomputation for N := 365
pn generalizedn for N := 365 n↓p(n↓)n↑p(n↑)
0.01
0.14178 √N
2.70864
20.002743
0.00820
0.050.32029 √N 6.1191660.0404670.05624
0.1
0.45904 √N
8.77002
80.074349
0.09462
0.2
0.66805 √N
12.76302
120.1670213
0.19441
0.30.84460 √N16.13607160.28360170.31501
0.51.17741 √N22.49439220.47570230.50730
0.71.55176 √N29.64625290.68097300.70632
0.81.79412 √N34.27666340.79532350.81438
0.92.14597 √N40.99862400.89123410.90315
0.952.44775 √N46.76414460.94825470.95477
0.99
3.03485 √N
57.98081
57
0.99012
580.99166

Notify: some values are coloured showing the approximation does not always give the exact result.

Empirical test

The birthday paradox can be simulated empirically using computer code.

The following command can be used with the Ruby programming language.

ruby -e 'puts (1..30).collect {rand(365)+1}.uniq.length'

where "30" is the number of people in the group and "365" is the number of days in a year. If the output is the same as the number of people in the group, there were no repeating birthdays. If it is not the same, then there were repeating birthdays.

This Bash script does the same thing:

for i in `seq 1 30`; do echo $(($RANDOM%365+1)); done | sort -u | wc -l

The following code is in Perl. It returns numbers found to be repeated on series generated. It is not unusual to get repetition for the 23 sample random values generated.

perl -e 'for (1..22) {$h{int(rand(365)+1)}++};
for (keys %h) {print $_, ": ", $h{$_}, " times\n" if $h{$_} > 1}'

This Objective CAML script prints the point at which the probability of finding shared birthdays exceeds 50%. For the original problem as framed, it prints the answer 23:

#!/usr/bin/ocamlrun /usr/bin/ocaml
let size = 365.;;
let rec loop p i =
  let p' = (size -. (float (i-1))) *. p /. size in
  if p' < 0.5 then Printf.printf "answer = %d\n" i else loop p' (i+1);;
loop 1.0 2

External links



The contents of this article are licensed from Wikipedia.org under the GNU Free Documentation License.
How to see transparent copy

01-04-2007 01:21:04