Notes
APMA 1650 - Spring 2020


Multivariate Distributions


multivariate

Learning Goals

  • Know what a joint distribution of a two or more random variables is (continuous or discrete).
  • Know how to compute probabilities, as well as marginals and conditional distributions of joint distributions.
  • Understand what it means for two random variables to be independent and how to test it.

Introduction

In real life, you will most often be interested in data that has multiple values of interest in an experiment. In probabilistic language, this means you are interested in the outcome of two or more random variables at the same time. The goal of this is often to deduce some kind of relationship (or lack thereof) between the data (this can be measured using the notions of covariance and correlation which we will discuss in the next lecture).

For example, you might be interested in:

  • Number of Coronavirus deaths and the average age of a population
  • The number of posts per day and the number of followers of Instagram users
  • The frequency of exercise and the academic performance of college students
  • The salary and gender of a population in a country

Question

How do you expect that the two variables in the above examples are related? Can you think of any other pairs of random variables that you might be interested in?

Another situation where we are often interested in multiple random variables is when conducting repeated experiments that have some variability. In this case (assuming the experiment is done correctly) the outcomes of each experiment should be independent in that the outcome of one experiment shouldn't affect the outcome of the next (we will discuss in more detail what it means for two random variables to be independent in later on in these notes). In fact we have been dealing with multiple independent random variables already in the setting of the law of large numbers and the central limit theorem (something that we will comeback to later).

Joint Distributions

When dealing with multiple random variables, say X,Y it is important to understand how they are they are distributed together. Since it is possible that X can depend on Y, the statistics of (X,Y) can reveal behavior that simply studying the statistics of X or Y alone cannot.

Discrete Case

Lets start with the case when the random variables are discrete. We first introduce the idea of a joint probability distribution which describes how both X and Y are distributed.

Joint probability distribution
Let X and Y be two discrete random variables, then the joint probability distribution of X and Y is given by
p(x,y):=P(X=x,Y=y),wherex,yR.

Since X and Y are discrete (and can therefore only take on a finite or countable number of values), the joint distribution p(x,y) is only non-zero at a countable number of values.

Note

Knowing the distributions of X and Y separately is generally not enough to determine the joint distribution of X and Y (unless they are independent as we will see later). As we will see, the joint distribution p(x,y) contains information about how the random variables X and Y depend on one another.

Suppose that X takes values {x1,x2,,xn} and Y takes values {y1,y2,ym}. Then the pair (X,Y) takes values {(x1,y1),(x2,y2),(xn,ym)}. The joint distribution p(xi,yj) can be organized into a joint probability table.

XY y1 y2 ym
x1 p(x1,y1) p(x1,y2) p(x1,ym)
x2 p(x2,y1) p(x2,y2) p(x2,ym)
xn p(xn,y1) p(xn,y2) p(xn,ym)
Properties

Joint probability distributions must satisfy the following properties

  1. 0p(x,y)1 for all x,y.
  2. All the probabilities add up to one xyp(x,y)=1, where the sum is over all values x,y such that p(x,y)0.

Note: The above summation is over a discrete set of values since X and Y are discrete. It can be an infinite sum.


Lets consider some examples:

Example 1 (Roll two fair dice)

Recall from the notes on discrete probability the problem of rolling two fair dice. Let

X=outcome of die 1Y=outcome of die 2,

then the probability table is given by:

XY 1 2 3 4 5 6
1 136 136 136 136 136 136
2 136 136 136 136 136 136
3 136 136 136 136 136 136
4 136 136 136 136 136 136
5 136 136 136 136 136 136
6 136 136 136 136 136 136

Since there are 36 cells in the table, the total probability clearly sums up to one.

This table is pretty boring since it does not indicate an interesting relationship between the random variables X, and Y. In fact X and Y are independent.

As a more interesting case, consider another random variable Z=7X, which takes what ever X gives and "flips" it around 3. Then X and Z are clearly dependent, since the value of X completely determines the value of Z, the probability table becomes:

XZ 1 2 3 4 5 6
1 0 0 0 0 0 1/6
2 0 0 0 0 1/6 0
3 0 0 0 1/6 0 0
4 0 0 1/6 0 0 0
5 0 1/6 0 0 0 0
6 1/6 0 0 0 0 0

If you were to look at the distribution of just Z, you wouldn't be able to tell that Z was "copying" off of X. This can only be recognized by considering the joint distribution.


We can also consider more complicated events.

Example 2

Let X and Y be as in the previous example. What is the probability that X<Y?


The event {X<Y} can be described explicitly as the set of (x,y) pairs

{X<Y}={(1,2),(1,3),(1,3),(1,5),(1,6),(2,3),(2,4),(2,4),(2,5),(2,6),(3,4),(3,5),(3,6),(4,5),(4,6),(5,6)}.

This can be visualized as a triangular subset of the probability table considered in Example 1, colored salmon below:

XY 1 2 3 4 5 6
1 136 136 136 136 136 136
2 136 136 136 136 136 136
3 136 136 136 136 136 136
4 136 136 136 136 136 136
5 136 136 136 136 136 136
6 136 136 136 136 136 136

Since there are 15 such elements, we see that P(X<Y)=1536.


The concept of joint probability can easily be generalized to the case of n random variables

Multivariate joint probability function

Let X1,X2,Xn be discrete random variables, then the joint probability distribution of X1,X2,Xn is given by

p(x1,x2,,xn):=P(X1=x1,X2=x2,Xn=xn).
where (x1,x2,,xn)Rn. Again, since X1,X2,Xn are discrete, p(x1,x2,xn) can only be non-zero on at most a discrete set of values.

Of course we must still, have 0p(x1,x2,,xn)1, and x1,x2,,xnp(x1,x2,,xn)=1 where the sum is over all values of (x1,x2,xn) such that p(x1,x2,,xn)0.

Continuous Case

To treat two continuous random variables, X and Y. We start by analogy with the single variable case. Namely instead of evaluating probabilities at specific values, we must evaluate probabilities on subsets of R2. In this case, we can think of the probability distribution as being given by the the infinitesimal probability associated to some joint probability density f(x,y) (or joint pdf)


multivariate
f(x,y)dxdy= probability of (X,Y) in the "rectangle" [dx]×[dy].

In general we can describe probabilities for X and Y using multivariable calculus and area integrals.

Joint probability density
Let X, Y be two continuous random variables, then the joint probability density function f(x,y) of X and Y is defined by
(1)P((X,Y)R)=Rf(x,y)dA.


multivariate
The probability of the region R is the volume under the joint density f(x,y).

The integral in (1) is an area integral and can be approximated by a sum Rf(x,y)dAijf(xi,yj)ΔxiΔyj of volumes of rectangular prisms each with volume f(xi,yj)ΔxiΔyj. We can interpret p(i,j)=f(xi,yj)ΔxiΔyj as a joint probability distribution function associated to the rectangle centered around (xi,yj).

Just as with the single variable case, a joint probability density must satisfy non-negativity and normality.

Properties

A joint probability density f(x,y) for two continuous random variables X and Y must satisfy

  1. 0f(x,y) for all x,yR.
  2. The total integral is 1, 00f(x,y)dxdy=1.

Computing Double Integrals (a review)

The area integral in the definition (1), can be calculated by iterated integrals if R can be cleanly described in certain coordinates.

Rectangular coordinates:If R is the rectangle R=[a,b]×[c,d], then we have

P((X,Y)R)=ab(cdf(x,y)dy)dx.

Depending, on f(x,y), the integral can then be computed by first freezing x, integrating y and then integrating x (or vice-versa). See the iterated integral section of Paul's Online Notes for a refresher on how to do this as well as some examples.

Regions bounded by functions: If R is a more general region bounded between two curves g1(x)g2(x) so that

R={(x,y)R2:axb,g1(x)yg2(x)},
then we can write the iterated integral like
P((X,Y)R)=abg1(x)g2(x)f(x,y)dydx.
See the Integration over General Regions sections of Paul's Online Notes for a refresher on how to do this as well as some examples.

Polar coordinates: As another example if the region R is a circle R={x2+y21}, then the area integral can be written as an iterated polar integral

P((X,Y)R)=02π(01f(rcosθ,rsinθ)rdr)dθ.
See the Integrals in Polar Coordinates sections of Paul's Online Notes for a refresher on how to do this as well as some examples.


Example 3

Suppose that X and Y are continuous random variables with values in [0,1] and joint density given by f(x,y)={cxy,if 0x,y10otherwise

What is the value of c that makes this a valid joint pdf? For this value of c, what is the probability that X>1/2 and Y<1/2?


Clearly f(x,y)0, if c0, and therefore we simply need to check the normality condition 2. We can do this using iterated partial integrals:

cxydxdy=0101cxydxdy=01(c2x2y]x=0x=1dy=01c2ydy=(c4y2]01=c4.

Therefore we need c=4 for this to be a valid joint pdf.

To find the probability that X>1/2 and Y<1/2 we note that

{X>1/2,Y<1/2}={(X,Y)[1/2,1]×[0,1/2]}.
Therefore the iterated integral can be set up as
P(X>12,Y<12)=0121214xydxdy=012(2yx2]x=1/2x=1dy=0122y(114)dy=14(34)=316.


Lets consider a more complicated joint density defined over a different shaped region.


Example 4 (Triangular Region)

This problem is taken from Example 5.4 in the text.

Suppose that Y1 and Y2 are continuous random variables with values in [0,1] such that Y2Y1 and the joint density given by f(x,y)={cy1,if 0y2y110otherwise

What is the value c that makes f(y1,y2) a valid joint density function? What is the probability that 0Y11/2 and Y2>1/4?


Let's check the normality condition. First, we should recognize that the domain of f(y1,y2) is a triangular region (shown in the figure below)

D={(y1,y2)R2:0y11,0y2y11}

This is a region bounded between two curves and there integral can be set up as

010y1cy1dy2dy1=01cy1(y2]0y1dy1=01cy12dy1=c3=1.

Therefore we must have c=3.

To compute the probability that 0Y11/2 and Y2>1/4 we need to carefully consider the region we are integrating over. Note, for instance, that if Y1<1/4, then since Y2Y1, it's not possible for Y21/4. We need to consider how the region {0y11/2,y2>1/4} overlaps with the triangular domain D={0y2y11}. This is illustrated in the following figure.

triangular-domain
As we can see, the event of interest is
E={(y1,y2):14y112,14y2y1}.

The associated integral is

P((Y1,Y2)E)=141214y13y1dy2dy1=14123y1(y114)dy1=(y1338y12]1412=5128.

Of course this can all be extended in a straight-forward way to the case of n jointly distributed random variables. However, this is beyond the scope of this course.

Multivariate joint probability density (optional)

Let X1,X2,Xn be continuous random variables, then the joint pdf, f(x1,x2,,xn), of X1,X2,Xn is define in terms of the n-dimensional volume integral

P((X1,X2,Xn)R)=Rf(x1,,xn)dV,

where RRn is any subregion of n dimensional space. In this setting, it is useful to think of X=(X1,X2,,Xn) as a random n-dimensional vector in.


Joint Cumulative Distribution

As was the case with single random variables, the distribution of probability between two discrete or continuous random variables can be characterized by the joint cumulative distribution function (or joint cdf)

Joint cumulative distribution
For any two random variables X,Y, the joint cumulative distribution function is defined by F(x,y):=P(Xx,Yy). Note: that this is defined regardless of whether X,Y are discrete or continuous.

This can be written in terms of the joint distribution of joint density in the discrete or continuous cases.

Discrete case

If X and Y are discrete random variables with joint distribution function p(x,y), then the joint cdf is just given by

F(x,y)=uxvyp(u,v),

where the sum is over u and v such that p(u,v)0.


Example 5

Let X and Y be as in the previous example be as in example 1. What is F(3.5,4)?


F(3.5,4) is just the event X3.5,Y4, we can visualize this shading cells in the probability table, colored salmon below:

XY 1 2 3 4 5 6
1 136 136 136 136 136 136
2 136 136 136 136 136 136
3 136 136 136 136 136 136
4 136 136 136 136 136 136
5 136 136 136 136 136 136
6 136 136 136 136 136 136

Since there are 12 such elements, we see that F(3.5,4)=1236.


Continuous case

If X and Y are continuous random variables with joint pdf f(x,y), then the joint cdf is just given by

F(x,y)=xyf(u,v)dvdu.

We can also recover the joint pdf from the joint cdf using partial derivatives f(x,y)=2Fxy(x,y).


Example 6

Let Y1 and Y2 be as in Example 4. What is the joint cdf?


The event Y1y1 and Y2y2 depends on whether y1y2 or y1y2. Both cases are illustrated below

triangular-domain
triangular-domain

If 0y1y21, then the region of integration is a triangle

F(y1,y2)=0y10t13t1dt2dt1=0y13t12dt1=y13

On the other hand, if 0y2y11 then the region of integration can be split up into a triangle and rectangle

F(y1,y2)=0y20t13t1dt2dt1+y2y10y23t1dt2dt1.=y23+y2y13t1y2dt1=y23+32(y12y22)y2=12y23+32y12y2.

Putting this together gives

F(y1,y2)={y13if 0y1y2112y23+32y12y2if 0y2y111ify11 and y210otherwise.

As was the case in one dimension, the joint cdf has various properties

Properties of the joint cdf

If X and Y are any random variables with joint cdf F(x,y) then the following properies hold

  1. F(x,y) is non-decreasing, meaning that if x or y increase, then F(x,y) can't decrease
  2. If you send any variable to , then F(x,y) goes to zero, i.e. F(,)=F(,y)=F(x,)=0.
  3. If you send both variables to +, then F(x,y) approaches 1, i.e. F(,)=1.

Using the cdf to calculate probabilities of rectangles

Just like in the single variable case, the cdf can be used to calculate probabilities on rectangles. This can be stated more precisely as: if x1x2 and y1y2, then the following rule holds.

P(x1Xx2,y1Yy2)=F(x2,y2)F(x2,y1)F(x1,y2)+F(x1,y1)

This is illustrated in the following animation using the additive and subtractive properties of probability. Positive contributions are colored red and negative contributions are colored blue.

cdf-rule

Marginal distributions

If two random variables X and Y are jointly distributed, it is often the case that you want to understand the distribution just one of them, ignoring any relationship between the two of them, such a distribution is called a marginal distribution.

Discrete case

Lets begin by considering an example:

Example 7

Recall the die tossing experiment from Example 1. Suppose we consider the event {X=3}. We know that this corresponds to the the following pairs

{X=3}={(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)}.
This corresponds to row 3 in the table.
XY 1 2 3 4 5 6
1 136 136 136 136 136 136
2 136 136 136 136 136 136
3 136 136 136 136 136 136
4 136 136 136 136 136 136
5 136 136 136 136 136 136
6 136 136 136 136 136 136

Using the additive law of probability, we can calculate P(X=3) by summing along the rows of the table

P(X=3)=p(3,1)+p(3,2)+p(3,3)+p(3,4)+p(3,5)+p(3,6)=16.

Of course we already knew this!


Motivated by the example, we define marginal distributions


Marginal probability distribution

Suppose that X and Y are discrete random variables with joint distribution function p(x,y). The marginal distribution functions pX(x) and pY(y) for X and Y respectively are given by summing the joint distribution over the other variable

pX(x):=yp(x,y),pY(y):=xp(x,y),
where the sum's are over all values such that p(x,y)0.

The marginals can be visualized in the following extension of the joint probability table, where the sums of each row and column are located on the right and bottom margins.

XY y1 y2 ym pX(x)
x1 p(x1,y1) p(x1,y2) p(x1,ym) pX(x1)
x2 p(x2,y1) p(x2,y2) p(x2,ym) pX(x2)
xn p(xn,y1) p(xn,y2) p(xn,ym) pX(xn)
pY(y) pY(y1) pY(y2) pY(ym) 1

Note

The marginal distributions say very little about a joint distribution. Two random variables can have the same marginal distribution and be very dependent on one another. For instance if X is the outcome of a fair die roll, then Z=7X has the same marginal distribution as X even though Z is completely determined by the outcome of X.

Continuous Case

The continuous case is similar to the discrete case, where sums are replaces with integrals.

Marginal probability density

Suppose that X and Y are continuous random variables with joint pdf f(x,y). The marginal distribution pdfs fX(x) and fY(y) for X and Y respectively are given by integrating out the other variable.

fX(x):=f(x,y)dy,fY(y):=f(x,y)dx.

Example 8

Consider the joint pdf from Example 3

f(x,y)={4xyif0x,y,10otherwise

Find the X and Y marginals.


To find fX(x) we integrate out the y variable and to find fY(y) we integrate out the x variable.
fX(x)=014xydy=[2xy2]01=2x
fY(y)=014xydx=[2x2y]01=2y.

Example 9

Consider the more complicated pdf of Example 4

f(y1,y2)={3y1if0y2y110otherwise

Find the X and Y marginals.


Taking into account the triangular region,

fY1(y1)=0y13y1dy2=3y12
fY2(y2)=y213y1dy1=32(1y22).

Marginal cdf

Finding the marginal cdf from a given joint cdf is actually pretty easy. No integration or summation is necessary.

Marginal cdf

Let X and Y be two random variables with joint cdf F(x,y), then the marginal cdfs are given by

FX(x)=limyF(x,y),FY(y)=limxF(x,y).

Note that if X and Y only take values in a rectangle [a,b]×[c,d], then this can be simplified to evaluating x or y at the right end point of it's respective domain,

FX(x)=F(x,d),FY(y)=F(b,y).

Lets illustrate this ideas with a more complicated example.

Example 10

Lets consider the cdf we calculated from Example 6.

F(y1,y2)={y13if 0y1y2112y23+32y12y2if 0y2y111ify11 and y210otherwise.

Find the X and Y marginal cdfs.


Evaluating at the y1=1 and y2=1 gives

FY1(y1)=F(y1,1)=y13
FY2(y2)=F(1,y2)=12(3y2y23).

Note that FY1(y1)=3y12 and FY2(y2)=32(1y22), which is consistent with the answers to Example 9.


Conditional Distributions

In addition to marginal distributions, one can also define conditional distributions that describe the distribution of one random variable given the value of another.

Discrete case

Recall the definition of conditional probability of A given B.

P(A|B)=P(AB)P(B).

We can apply this formula to the distribution of two discrete random variables X and Y, using the random variables to define events like {X=1} and {Y=2}. Then we can consider the probability X taking a certain value given that Y takes a different value by

P(X=1|Y=2)=P(X=1,Y=2)P(Y=2).

This motivates the following definition.

Conditional distribution function

Let X and Y be two random variables with joint distribution p(x,y), then the conditional distribution function of X given Y is

p(x|y):=p(x,y)pY(y),
provided that pY(y)0. A similar definition holds for p(y|x),
p(y|x):=p(x,y)pX(x),

Note

p(x|y) is undefined if pY(y)=0. This is because it doesn't make sense to condition on a probability zero event.

Continuous case

The continuous case is much more subtle. For instance, it is hard to define P(Xx|Y=y), since the probability that {Y=y} is zero.

However, one of the remarkable features of continuous probability distributions is that it is possible to make sense of conditioning on {Y=y} by taking a limit. For instance, we can define a conditional cdf by shrinking the interval [y,y+h] to the point y and defining

P(Xx|Y=y):=limh0P(Xx|yYy+h).

The next result shows that this limit exists and

Conditional cdf

Let X and Y be two continuous random variables with joint pdf f(x,y) and joint cdf F(x,y), then the conditional cdf of X given Y is defined by

F(x|y):=limh0P(Xx|yYy+h)=yF(x,y)fY(y)=0xf(u,y)dufY(y)
provided that fY(y)0.
Proof:

The expression P(Xx|yYy+h) is well defined since P(yYy+h)0 for some values of y.

We can calculate this limit explicitly since

P(Xx|yYy+h)=P(Xx,yYy+h)P(yYy+h)=F(x,y+h)F(x,y)FY(y+h)FY(y)=1h(F(x,y+h)F(x,y))1h(FY(y+h)FY(y)),

where in the last line we multiplied and divided by 1h. Using the fact that

1h(F(x,y+h)F(x,y))yF(x,y)=0xf(u,y)du 1h(FY(y+h)FY(y))FY(y)=fY(y),

as h0, completes the proof.

QED


This motivates the following definition of the conditional pdf by taking the derivative of the conditional cdf.

Conditional pdf

Let X and Y be two continuous random variables with joint pdf f(x,y), then the conditional pdf of X given Y is

f(x|y):=xF(x|y)=f(x,y)fY(y),
provided that fY(y)0. An analogous definition holds for f(y|x).

Example 11

Lets consider pdf from Example 6.

f(y1,y2)={3y1if0y2y110otherwise

Find f(y1|y2) and use it to calculate P(Y13/4|Y2=1/2).


From example 9, we found that fY2(y2)=32(1y22)0y21. Therefore

f(y1|y2)={2y11y22if0y2y110otherwise.

To calculate P(Y11/2|Y2=1/2) we note that

f(y1|1/2)={83y1if12y110otherwise.

Therefore

P(Y13/4|Y2=1/2)=03/4f(y1|1/2)dy1=1/23/483y1dy1=43(91614)=512

Independence of Random Variables

We can now give a precise definition of independence of two random variables.

Recall that two events A and B are independent if and only if P(AB)=P(A)P(B). Of course we can take A and B to be events defined to two random variables X and Y, for instance A={X3} and B={Y1}. We then say that two random variables X and Y are independent if any event defined by X and any event defined by Y are independent. As it turns out it's good enough to consider events of the form {Xx} and {Yy} for all x and y, the condition P(Xx,Yy)=P(Xx)P(Yy) guarantees independence.

Independence

Two random variables X and Y with joint cdf F(x,y) and marginal cdfs FX(x) and FY(y) are independent if and only if

F(x,y)=FX(x)FY(y).

Otherwise X and Y are said to be dependent.


In the discrete case, the definition can be reduced to:

Independence (discrete case)

Two discrete random variables X and Y with joint distribution function p(x,y) and marginal distributions pX(x) and pY(y) are independent if and only if

p(x,y)=pX(x)pY(y).

Otherwise X and Y are said to be dependent.


In the discrete case, independence means the probability in a cell of the probability table must be the product of the marginal probabilities of its row and column. This is considered in the next example.

Example 12

Lets consider the die rolling experiment from Example 1. Of course we know that X and Y are independent, but lets check that out notion of independence is correct.

The probability table is

XY 1 2 3 4 5 6 pX(x)
1 136 136 136 136 136 136 16
2 136 136 136 136 136 136 16
3 136 136 136 136 136 136 16
4 136 136 136 136 136 136 16
5 136 136 136 136 136 136 16
6 136 136 136 136 136 136 16
pY(y) 16 16 16 16 16 16 1

Since each marginal has probability 16 and each cell has probability 136 (the produce of the marginals), we can see that X and Y are independent.

On the otherhand, lets consider X and Z=7X. The probability table was given by

XZ 1 2 3 4 5 6 pX(x)
1 0 0 0 0 0 16 16
2 0 0 0 0 16 0 16
3 0 0 0 16 0 0 16
4 0 0 16 0 0 0 16
5 0 16 0 0 0 0 16
6 16 0 0 0 0 0 16
pZ(z) 16 16 16 16 16 16 1

In this table, we can see that many of the probabilities are not the product of two marginal probabilities and so X and Z are dependent. Indeed, since none of the marginal probabilities are zero, then none of the cells with zero probability can be a product of marginals.


In the continuous case (by taking partial derivatives), this can be reduced to a statement on the pdfs:

Independence (continuous case)

Two continuous random variables X and Y with joint pdf f(x,y) and marginal pdfs fX(x) and fY(y) are independent if and only if

f(x,y)=fX(x)fY(y).

Otherwise X and Y are said to be dependent.


In many ways the continuous case is easier to check that the discrete case, since it can be reduced to showing that the density can be factored into a product of two densities.

Example 13
Let X and Y have joint density f(x,y)={4xyif0x,y10otherwise Are X and Y independent?

In this case, since fX(x)=2x for 0x1 and fY(y)=2y and 0y1, we can easily see that for 0x,y1, fX(x)fY(y)=4xy=f(x,y).


Example 14

How about the pdf from Example 6? Are Y1 and Y2 independent?


In this case, we know that for 0y11 and 0y21 we have

fY1(y1)=3y12,fY2(y2)=32(1y22).

So that for all 0y1,y21

fY1(y1)fY2(y2)=92y12(1y22).

This is no where close to f(y1,y2) if not for the simple fact that fY1(y1)fY2(y2)0 when 0y1y21 while f(y1,y2)=0.