Home » Data » Hypergeometric Distribution Explained With Python

Hypergeometric Distribution Explained With Python

November 6, 2019

With probability problems in a math class, the probabilities you need are either given to you or it is relatively easy to compute them in a straight-forward manner.

But in reality, this is not the case. You need to compute the probability yourself based on the situation. That is where probability distributions can help.

Today we are going to explore the hypergeometric probability distribution by:

Explaining what situations it is useful for.

The information that you need to apply this distribution.

Coding some of the computations from scratch using Python.

Applying our code to problems.

The hypergeometric distribution is a discrete probability distribution. It is used when you want to determine the probability of obtaining a certain number of successes without replacement from a specific sample size. This is similar to the binomial distribution, but this time you are not given the probability of a single success. Some example situations to apply this distribution are:

The probability of getting 3 spades in a 5 card hand in poker.

The probability of getting 4 to 5 non-land cards in an opening hand in Magic the Gathering for a standard 60 card deck.

The probability of drawing 60% boys for the freshman class from a mixed-gender group randomly selected in a charter school admissions lottery.

To compute the probability mass function (aka a single instance) of a hypergeometric distribution, we need:

a) The total number of items we are drawing from (called N).

b) The total number of desired items in N (called A).

c) The number of draws from N we will make (called n).

d) The number of desired items in our draw of n items (called x).

There are different letters used for these variables depending on the tutorial. I am using the letters used from the video I posted below where I initially learned about the hypergeometric distribution.

Recall the Probability Mass Function (PMF) is what allows us to compute the probability of a single situation. In our case, that is the specific value for x above. The hypergeometric distribution PMF is below.