Joint and Marginal Distributions
You know what always bugs me while reading research papers? The distributions! When the paper talks about joint distributions, I run into delirium! (Don’t know whether it is grammatically correct, but I wanted to use this word! )
So the best thing to keep in mind about Joint Probability is the word “joint”. Probability of two events happening together! A and B.
If A and B are two events : P(A and B) is joint distribution!
Joint Probability Distribution:
This distribution shows the probability distribution between two events. The formal definition is
f(x,y) = P(X=x,Y=y).
So what’s the point of Joint Probability Distribution? Why do we need it ?
It looks for a relationship between two variables!
If we are to find X =3 and Y = 2, then it would be 1/6. The Joint Probability Distribution f(3,2) = P(X=3,Y=2) = 1/6.
WHY IT IS DIFFICULT & INTRACTABLE FOR MANY ML MODELS?
The first thing is that we cannot generate all X values and all Y values. They are not discrete values either. Like the ones shown above. They are continuous almost everytime.
Second thing is that there are many variables! For 2 variables X and Y that can take discrete values, we needed table of that size. If X could take 1000 values and Y could take another 1000. We need 1000*1000–1 probability distribution values.
The formula that describes all possible combinations of X and Y is called a joint PDF.
Probability Density Functions:
This function maps the random variable to probability of that variable occurring given constraints (mean and variance) of all variables that we taking account!
What about the probability any person will weigh exactly 180lbs? P(Y=180)?
Your answer might be 50% but No! In order to calculate the value we need the area. But Exactly 180 would mean a line, but a line has no area! So it is practically zero!
MARGINAL DISTRIBUTION
It is simple thing. If f(x,y) is joint probability distribution function, then
g(x) = Σy f (x,y) and h(y) = Σx f (x,y) are the marginal distributions of X and Y.
ΣP(X,Y=y) for all y is marginal distribution of X!