Know what a joint distribution of a two or more random variables is (continuous or discrete).
Know how to compute probabilities, as well as marginals and conditional distributions of joint distributions.
Understand what it means for two random variables to be independent and how to test it.
Introduction
In real life, you will most often be interested in data that has multiple values of interest in an experiment. In probabilistic language, this means you are interested in the outcome of two or more random variables at the same time. The goal of this is often to deduce some kind of relationship (or lack thereof) between the data (this can be measured using the notions of covariance and correlation which we will discuss in the next lecture).
For example, you might be interested in:
Number of Coronavirus deaths and the average age of a population
The number of posts per day and the number of followers of Instagram users
The frequency of exercise and the academic performance of college students
The salary and gender of a population in a country
Question
How do you expect that the two variables in the above examples are related? Can you think of any other pairs of random variables that you might be interested in?
Another situation where we are often interested in multiple random variables is when conducting repeated experiments that have some variability. In this case (assuming the experiment is done correctly) the outcomes of each experiment should be independent in that the outcome of one experiment shouldn't affect the outcome of the next (we will discuss in more detail what it means for two random variables to be independent in later on in these notes). In fact we have been dealing with multiple independent random variables already in the setting of the law of large numbers and the central limit theorem (something that we will comeback to later).
Joint Distributions
When dealing with multiple random variables, say it is important to understand how they are they are distributed together. Since it is possible that can depend on , the statistics of can reveal behavior that simply studying the statistics of or alone cannot.
Discrete Case
Lets start with the case when the random variables are discrete. We first introduce the idea of a joint probability distribution which describes how both and are distributed.
Joint probability distribution
Let and be two discrete random variables, then the joint probability distribution of and is given by
Since and are discrete (and can therefore only take on a finite or countable number of values), the joint distribution is only non-zero at a countable number of values.
Note
Knowing the distributions of and separately is generally not enough to determine the joint distribution of and (unless they are independent as we will see later). As we will see, the joint distribution contains information about how the random variables and depend on one another.
Suppose that takes values and takes values . Then the pair takes values . The joint distribution can be organized into a joint probability table.
Properties
Joint probability distributions must satisfy the following properties
for all .
All the probabilities add up to one
where the sum is over all values such that .
Note: The above summation is over a discrete set of values since and are discrete. It can be an infinite sum.
Lets consider some examples:
Example 1 (Roll two fair dice)
Recall from the notes on discrete probability the problem of rolling two fair dice. Let
then the probability table is given by:
1
2
3
4
5
6
1
2
3
4
5
6
Since there are cells in the table, the total probability clearly sums up to one.
This table is pretty boring since it does not indicate an interesting relationship between the random variables , and . In fact and are independent.
As a more interesting case, consider another random variable
which takes what ever gives and "flips" it around 3. Then and are clearly dependent, since the value of completely determines the value of , the probability table becomes:
1
2
3
4
5
6
1
0
0
0
0
0
1/6
2
0
0
0
0
1/6
0
3
0
0
0
1/6
0
0
4
0
0
1/6
0
0
0
5
0
1/6
0
0
0
0
6
1/6
0
0
0
0
0
If you were to look at the distribution of just , you wouldn't be able to tell that was "copying" off of . This can only be recognized by considering the joint distribution.
We can also consider more complicated events.
Example 2
Let and be as in the previous example. What is the probability that ?
The event can be described explicitly as the set of pairs
This can be visualized as a triangular subset of the probability table considered in Example 1, colored salmon below:
1
2
3
4
5
6
1
2
3
4
5
6
Since there are such elements, we see that
The concept of joint probability can easily be generalized to the case of random variables
Multivariate joint probability function
Let be discrete random variables, then the joint probability distribution of is given by
where . Again, since are discrete, can only be non-zero on at most a discrete set of values.
Of course we must still, have ,
and
where the sum is over all values of such that .
Continuous Case
To treat two continuous random variables, and . We start by analogy with the single variable case. Namely instead of evaluating probabilities at specific values, we must evaluate probabilities on subsets of . In this case, we can think of the probability distribution as being given by the the infinitesimal probability associated to some joint probability density (or joint pdf)
probability of in the "rectangle" .
In general we can describe probabilities for and using multivariable calculus and area integrals.
Joint probability density
Let , be two continuous random variables, then the joint probability density function of and is defined by
The probability of the region is the volume under the joint density .
The integral in is an area integral and can be approximated by a sum
of volumes of rectangular prisms each with volume . We can interpret as a joint probability distribution function associated to the rectangle centered around .
Just as with the single variable case, a joint probability density must satisfy non-negativity and normality.
Properties
A joint probability density for two continuous random variables and must satisfy
for all .
The total integral is 1,
Computing Double Integrals (a review)
The area integral in the definition , can be calculated by iterated integrals if can be cleanly described in certain coordinates.
Rectangular coordinates:If is the rectangle , then we have
Depending, on , the integral can then be computed by first freezing , integrating and then integrating (or vice-versa). See the iterated integral section of Paul's Online Notes for a refresher on how to do this as well as some examples.
Regions bounded by functions: If is a more general region bounded between two curves so that
Suppose that and are continuous random variables with values in and joint density given by
What is the value of that makes this a valid joint pdf? For this value of , what is the probability that and ?
Clearly , if , and therefore we simply need to check the normality condition 2. We can do this using iterated partial integrals:
Therefore we need for this to be a valid joint pdf.
To find the probability that and we note that
Therefore the iterated integral can be set up as
Lets consider a more complicated joint density defined over a different shaped region.
Example 4 (Triangular Region)
This problem is taken from Example 5.4 in the text.
Suppose that and are continuous random variables with values in such that and the joint density given by
What is the value that makes a valid joint density function? What is the probability that and ?
Let's check the normality condition. First, we should recognize that the domain of is a triangular region (shown in the figure below)
This is a region bounded between two curves and there integral can be set up as
Therefore we must have .
To compute the probability that and we need to carefully consider the region we are integrating over. Note, for instance, that if , then since , it's not possible for . We need to consider how the region overlaps with the triangular domain . This is illustrated in the following figure.
As we can see, the event of interest is
The associated integral is
Of course this can all be extended in a straight-forward way to the case of jointly distributed random variables. However, this is beyond the scope of this course.
Multivariate joint probability density (optional)
Let be continuous random variables, then the joint pdf, , of is define in terms of the -dimensional volume integral
where is any subregion of dimensional space. In this setting, it is useful to think of
as a random -dimensional vector in.
Joint Cumulative Distribution
As was the case with single random variables, the distribution of probability between two discrete or continuous random variables can be characterized by the joint cumulative distribution function (or joint cdf)
Joint cumulative distribution
For any two random variables , the joint cumulative distribution function is defined by
Note: that this is defined regardless of whether are discrete or continuous.
This can be written in terms of the joint distribution of joint density in the discrete or continuous cases.
Discrete case
If and are discrete random variables with joint distribution function , then the joint cdf is just given by
where the sum is over and such that .
Example 5
Let and be as in the previous example be as in example 1. What is ?
is just the event , we can visualize this shading cells in the probability table, colored salmon below:
1
2
3
4
5
6
1
2
3
4
5
6
Since there are such elements, we see that
Continuous case
If and are continuous random variables with joint pdf , then the joint cdf is just given by
We can also recover the joint pdf from the joint cdf using partial derivatives
Example 6
Let and be as in Example 4. What is the joint cdf?
The event and depends on whether or . Both cases are illustrated below
If , then the region of integration is a triangle
On the other hand, if then the region of integration can be split up into a triangle and rectangle
Putting this together gives
As was the case in one dimension, the joint cdf has various properties
Properties of the joint cdf
If and are any random variables with joint cdf then the following properies hold
F(x,y) is non-decreasing, meaning that if or increase, then can't decrease
If you send any variable to , then goes to zero, i.e.
If you send both variables to , then approaches , i.e. .
Using the cdf to calculate probabilities of rectangles
Just like in the single variable case, the cdf can be used to calculate probabilities on rectangles. This can be stated more precisely as: if and , then the following rule holds.
This is illustrated in the following animation using the additive and subtractive properties of probability. Positive contributions are colored red and negative contributions are colored blue.
Marginal distributions
If two random variables and are jointly distributed, it is often the case that you want to understand the distribution just one of them, ignoring any relationship between the two of them, such a distribution is called a marginal distribution.
Discrete case
Lets begin by considering an example:
Example 7
Recall the die tossing experiment from Example 1. Suppose we consider the event . We know that this corresponds to the the following pairs
This corresponds to row in the table.
1
2
3
4
5
6
1
2
3
4
5
6
Using the additive law of probability, we can calculate by summing along the rows of the table
Of course we already knew this!
Motivated by the example, we define marginal distributions
Marginal probability distribution
Suppose that and are discrete random variables with joint distribution function . The marginal distribution functions and for and respectively are given by summing the joint distribution over the other variable
where the sum's are over all values such that .
The marginals can be visualized in the following extension of the joint probability table, where the sums of each row and column are located on the right and bottom margins.
Note
The marginal distributions say very little about a joint distribution. Two random variables can have the same marginal distribution and be very dependent on one another. For instance if is the outcome of a fair die roll, then has the same marginal distribution as even though is completely determined by the outcome of .
Continuous Case
The continuous case is similar to the discrete case, where sums are replaces with integrals.
Marginal probability density
Suppose that and are continuous random variables with joint pdf . The marginal distribution pdfs and for and respectively are given by integrating out the other variable.
Example 8
Consider the joint pdf from Example 3
Find the and marginals.
To find we integrate out the variable and to find we integrate out the variable.
Example 9
Consider the more complicated pdf of Example 4
Find the and marginals.
Taking into account the triangular region,
Marginal cdf
Finding the marginal cdf from a given joint cdf is actually pretty easy. No integration or summation is necessary.
Marginal cdf
Let and be two random variables with joint cdf , then the marginal cdfs are given by
Note that if and only take values in a rectangle , then this can be simplified to evaluating or at the right end point of it's respective domain,
Lets illustrate this ideas with a more complicated example.
Example 10
Lets consider the cdf we calculated from Example 6.
Find the and marginal cdfs.
Evaluating at the and gives
Note that and , which is consistent with the answers to Example 9.
Conditional Distributions
In addition to marginal distributions, one can also define conditional distributions that describe the distribution of one random variable given the value of another.
Discrete case
Recall the definition of conditional probability of given .
We can apply this formula to the distribution of two discrete random variables and , using the random variables to define events like and . Then we can consider the probability taking a certain value given that takes a different value by
This motivates the following definition.
Conditional distribution function
Let and be two random variables with joint distribution , then the conditional distribution function of given is
provided that . A similar definition holds for ,
Note
is undefined if . This is because it doesn't make sense to condition on a probability zero event.
Continuous case
The continuous case is much more subtle. For instance, it is hard to define
since the probability that is zero.
However, one of the remarkable features of continuous probability distributions is that it is possible to make sense of conditioning on by taking a limit. For instance, we can define a conditional cdf by shrinking the interval to the point and defining
The next result shows that this limit exists and
Conditional cdf
Let and be two continuous random variables with joint pdf and joint cdf , then the conditional cdf of given is defined by
provided that .
Proof:
The expression is well defined since for some values of .
We can calculate this limit explicitly since
where in the last line we multiplied and divided by . Using the fact that
as , completes the proof.
QED
This motivates the following definition of the conditional pdf by taking the derivative of the conditional cdf.
Conditional pdf
Let and be two continuous random variables with joint pdf , then the conditional pdf of given is
provided that . An analogous definition holds for .
Example 11
Lets consider pdf from Example 6.
Find and use it to calculate .
From example 9, we found that
Therefore
To calculate we note that
Therefore
Independence of Random Variables
We can now give a precise definition of independence of two random variables.
Recall that two events and are independent if and only if
Of course we can take and to be events defined to two random variables and , for instance and . We then say that two random variables and are independent if any event defined by and any event defined by are independent. As it turns out it's good enough to consider events of the form and for all and , the condition
guarantees independence.
Independence
Two random variables and with joint cdf and marginal cdfs and are independent if and only if
Otherwise and are said to be dependent.
In the discrete case, the definition can be reduced to:
Independence (discrete case)
Two discrete random variables and with joint distribution function and marginal distributions and are independent if and only if
Otherwise and are said to be dependent.
In the discrete case, independence means the probability in a cell of the probability table must be the product of the marginal probabilities of its row and column. This is considered in the next example.
Example 12
Lets consider the die rolling experiment from Example 1. Of course we know that and are independent, but lets check that out notion of independence is correct.
The probability table is
1
2
3
4
5
6
1
2
3
4
5
6
1
Since each marginal has probability and each cell has probability (the produce of the marginals), we can see that and are independent.
On the otherhand, lets consider and . The probability table was given by
1
2
3
4
5
6
1
0
0
0
0
0
2
0
0
0
0
0
3
0
0
0
0
0
4
0
0
0
0
0
5
0
0
0
0
0
6
0
0
0
0
0
1
In this table, we can see that many of the probabilities are not the product of two marginal probabilities and so and are dependent. Indeed, since none of the marginal probabilities are zero, then none of the cells with zero probability can be a product of marginals.
In the continuous case (by taking partial derivatives), this can be reduced to a statement on the pdfs:
Independence (continuous case)
Two continuous random variables and with joint pdf and marginal pdfs and are independent if and only if
Otherwise and are said to be dependent.
In many ways the continuous case is easier to check that the discrete case, since it can be reduced to showing that the density can be factored into a product of two densities.
Example 13
Let and have joint density
Are and independent?
In this case, since for and and , we can easily see that for ,
Example 14
How about the pdf from Example 6? Are and independent?
In this case, we know that for and we have
So that for all
This is no where close to if not for the simple fact that when while .