The Binomial Distribution

[A, SfS] Chapter 3: Probability Distributions: 3.1: The Binomial Distribution

The Binomial Distribution

In this lesson, you will learn about modeling the probability of #x# successes out of #n# independent repetitions of an experiment with #2# possible outcomes.

#\text{}#

Many random variables can be modeled with a named distribution model. There are numerous such distribution models, for both discrete and continuous random variables. We will focus here on the models that are relevant for statistics, and refer to other models only for examples and exercises. A full probability course would cover many other models.

#\text{}#

We begin with the most important model among discrete random variables: the Binomial Distribution.

Bernoulli Trial

Consider an experiment with exactly #2# possible outcomes (e.g., flipping a coin). We can arbitrarily label one of the outcomes “success” and the other “failure”. The probability of a success for this experiment is denoted #p#. Such an experiment is called a Bernoulli Trial.

Bernoulli

This type of experiment is named for one of the founders of the study of probability, Daniel Bernoulli (1700-1782), who was born in the Netherlands in Groningen.

Mean and Variance of Bernoulli Distribution

Let #X# denote the outcome of a Bernoulli Trial, where #X = 1# if the trial results in a success, and #X = 0# if the trial results in a failure.

So, the probability mass function of #X# is:

\[p(x) = \begin{cases}
p& \text{if}\,\,\,\, x = 1\\\\
1 - p& \text{if}\,\,\,\, x = 0
\end{cases}\]

The mean of the Bernoulli distribution is easy to compute:

\[\mu_X = (0)(1 - p) + (1)(p) = p\]

The variance is also easy to compute:

\[\sigma^2_X = (0)^2(1 - p) + (1)^2(p) - p^2 = p - p^2 = p(1 - p)\]

For example, suppose you are escaping from a castle, and you come to four doors. A wizard tells you that one of the doors leads to freedom, and the other three doors lead to doom.

You will select a door randomly, so the probability of success (i.e., freedom) is #0.25#.

If we define #X# to be equal #1# if you choose the door to freedom and to equal #0# otherwise, then #X# has a Bernoulli distribution with:

\[\begin{array}{rcl}
\mu_X &=& 0.25\\\\
\sigma^2_X &=& (0.25)(1 - 0.25) = 0.1875
\end{array}\]

#\text{}#

Binomial Distribution

Now suppose we have a situation in which we have #n# independent repetitions of a Bernoulli trial (e.g., we are going to flip a coin 10 times).

Let #X# denote the total number of successes observed among the #n# trials. Note that #X# can equal one of the values in the set #\{0,1,...,n\}#.

We then say that #X# has the Binomial distribution with parameters #n# and #p#.

We can write this in symbols as #X \sim B(n,p)# (if we have only one repetition, then #X \sim B(1,p)# and we say that #X# has the Bernoulli distribution).

Binomial Probability Mass Function

We want the probability mass function for #X# in this situation i.e., a general formula for #\mathbb{P}(X = x)#.

Out of #n# repetitions, there are #\binom{n}{x}# ways to have #x# successes (and thus #n-x# failures).

For any particular combination of #x# successes and #n-x# failures, the probability of that combination is thus:

\[p^x(1 - p)^{n - x}\]

Putting it all together, we have a formula for the pmf of #X#:

\[\mathbb{P}(X = x) = \binom{n}{x}p^x(1 - p)^{n-x}\]

for #x = 0,1,...,n#.

For example, if #X \sim B(5, 0.25)#, then:

\[\begin{array}{rcl}
\mathbb{P}(X = 3) &=& \displaystyle\binom{5}{3}(0.25)^3(1 - 0.25)^{5 - 3} \\\\
&=& \cfrac{5!}{3!2!}(0.25)^3(0.75)^2 \approx 0.0879
\end{array}\]

#\text{}#

Binomial Cumulative Distribution Function

The cumulative distribution function #F(x) = \mathbb{P}(X \leq x)# for the binomial distribution is a summation of the probability mass function from #0# to the largest integer in the set #\{0,1,\ldots,n\}# that is smaller than or equal to #x#.

If #x \lt 0# then every integer in that set is larger than #x#, so #F(x)=0#.

If #x \geq n# then #F(x) = 1#, since we would sum the pmf over all the possible values of #X#.

That is, if #X \sim B(n,p)# then:

\[F(x) = \begin{cases}
0& \text{if}\,\,\,\, x \lt 0\\\\
\displaystyle\sum_{i=0}^k \binom{n}{i}p^i(1 - p)^{n - i}& \text{if}\,\,\,\, x \geq 0
\end{cases}\] where #k# is the largest integer in the set #\{0,1,\ldots,n\}# such that #x\leq k#.

Clearly this could involve a lot of work to compute, so in practice we use statistical software (or tables in statistics books) to compute the cdf.

#\text{}#

Now suppose #X \sim B(n,p)#. What are the mean and variance of #X#?

Since #X# is the number of successes in #n# independent Bernoulli Trials, we can write

\[X = X_1 + X_2 + \cdots + X_n\]

where #X_1# is the number of successes (#0# or #1#) in the first trial, #X_2# is the number of successes (#0# or #1#) in the second trial, …, and #X_n# is the number of successes (#0# or #1#) in the #n#th trial.

Thus #X_i \sim B(1,p)# for each #i#. Then:

\[\begin{array}{rcl}
\mu_X &=& \mu_{X_1} + \mu_{X_2} + \cdot\cdot\cdot + \mu_{X_n} \\\\
&=& p + p + \cdot\cdot\cdot + p \\\\
&=& np
\end{array}\]

And because #X_1# through #X_n# are independent:

\[\begin{array}{rcl}
\sigma^2_X &=& \sigma^2_{X_1} + \sigma^2_{X_2} + \cdots + \sigma^2_{X_n} \\\\
&=& p(1 - p) + p(1 - p) + \cdot\cdot\cdot + p(1 - p) \\\\
&=& np(1 - p)\\\\
\end{array}\]

Mean and Variance of the Binomial Probability Distribution

In summary, if #X \sim B(n,p)# then:

\[\begin{array}{rcl}
\mu_X &=& np \\\\
\sigma_X^2 &=& np(1 - p)
\end{array}\]

A system consists of #4# identical components, each of which works independently from the others. Each component has a probability of failure of #0.04#.

If one or more components fail, then the whole system will shut down. What is the probability that the whole system will shut down?

Let #X# denote the number of components out of #4# that fail. Then #X \sim B(4, 0.04)#.

The whole system will shut down if #X \geq 1#. Thus we need:

\[\begin{array}{rcl}
\mathbb{P}(X \geq 1) &=& 1 - \mathbb{P}(X = 0) = 1 - \displaystyle\binom{4}{0}(0.04)^0(0.96)^4\\\\
&=& 1 - (0.96)^4 \approx 0.1507
\end{array}\]

For the previous question, what is the mean number of components #X# out of #4# in the system that fail?

Since #X \sim B(4, 0.04)#:

\[\mu_X = np = (4)(0.04) = 0.16\]

For the previous question, what is the variance in the number of components #X# out of #4# in the system that fail?

Since #X \sim B(4, 0.04)#:

\[\sigma^2_X = np(1-p) = (4)(0.04)(1-0.04) = 0.1536\]

#\text{}#

Consider this situation: you have a population consisting of #n# elements. A proportion #p# of those elements have some characteristic of interest. You will randomly select one element from the population and note whether the element has the characteristic of interest (a success) or not (a failure). Let #X = 0# if the outcome is a failure and #X = 1# if it is a success. The probability the outcome will be a success is the same as the proportion of successes in the population.

If #80\%# of the elements in the population are green, then the probability that a randomly-selected element will be green is #0.80#. Thus we can conclude that #X \sim B(1, 0.80)#.

Now suppose you will randomly select #n# elements from the population and count the number of successes #X# out of the #n# selected elements. Can we say that #X \sim B(n, 0.80)#?

No. Why not? Because once the first element is selected, the proportion of successes remaining in the population will change (either increase or decrease). Once the second element is selected, the proportion will change again, and so on. So we cannot think of this as #n# identical repetitions of the same Bernoulli experiment. It would be incorrect to model this using the binomial distribution (the hypergeometric distribution, which we will not discuss in this course, is the correct model to use).

However, if the population is very large relative to #n#, then the amount by which #p# changes after each element is selected may be so small that we may ignore this issue.

Binomial Distribution and Sampling

If the population you are sampling from is relatively large compared to the sample size #n#, then the probability of a success #p# may be assumed to be constant, and the number of successes #X# out of the #n# selected elements may be assumed to be binomially distributed with parameters #n# and #p#.

If you randomly sample #16# people from a population of #200,000# people, of which #37.6\%# are carriers of a specific disease, then it is considered acceptable to model the number #X# of people in your sample who are carriers of the disease as #X \sim B(16, 0.376)#.

It is not a perfectly correct model, but it is accurate enough in practice, and it is indeed what we do in statistics quite regularly, as you will see.

#\text{}#

Using R

Binomial Probability Mass Function

Suppose #X# is modeled with a binomial distribution with parameters #n# and #p#. In #\mathrm{R}#, if you want to compute #\mathbb{P}(X = x)# for # x = 0,1,...,n#, you can do so without using the formula given for the pmf of the binomial distribution. Instead, use:

> dbinom(x,n,p)

For example, if #X \sim B(8,0.45)#, then #\mathbb{P}(X = 5)# can be found quickly using:

> dbinom(5,8,0.45)

Binomial Cumulative Distribution Function

If you want #\mathbb{P}(X \leq x)# for #x = 0,1,...,n#, you can also do this with a single command in #\mathrm{R}#:

> pbinom(x,n,p)

For example, if #X \sim B(8, 0.45)#, then #\mathbb{P}(X \leq 5)# is found using:

> pbinom(5,8,0.45)

And #\mathbb{P}(X \geq 2)# is found using:

> 1 - pbinom(1,8,0.45)

While #\mathbb{P}(3 \leq X < 7)# is found using

> pbinom(6,8,0.45) - pbinom(2,8,0.45)

Generating a Random Sample from a Binomial Distribution

Now suppose you want to generate a random sample of size #N# from a binomial distribution with parameters #n# and #p#. This can be performed in #\mathrm{R}# using:

> rbinom(N,n,p)

For example, to generate a sample of #100# values from a #B(12,0.7)# distribution, and save it as a vector named #\mathtt{MySpace}#, use:

> MySpace = rbinom(100,12,0.7)