Lecture 35
Qi Wang, Department of Statistics
Nov 14, 2018
Heights of Pokémon are Normally Distributed with a mean of 59 inches and a standard deviation of 17 inches.
If a Binomial distribution has a large enough combination of n and p, it behaves much like a Normal distribution, which means we can use the Normal distribution to approximate the original Binomial distribution
You may notice that Binomial is Discrete, and Normal is Continuous. This means the approximation comes at a cost of accuracy that we must try to correct. When we use the approximation, we need to perform a continuity correction:
Examples
Sometimes one or both variables are quantitative, but we classify them into categories for data collection and/or analysis. For example, suppose our variables are years of college education and income. We decide to group years of education into four classes: none, some college, Bachelor’s degree, and post-graduate. We also decide to classify annual income in dollars into four classes: $<10,000, 10,000-30,000, 30,001-50,000,$ and $>50,000$.
An instructor taught four sections of a large statistics course and had the following distribution of grades when the semester was finished.
Grade | One | Two | Three | Four | Total |
---|---|---|---|---|---|
A | 12 | 18 | 10 | 12 | |
B | 26 | 26 | 16 | 16 | |
C | 28 | 20 | 24 | 18 | |
D | 6 | 8 | 20 | 18 | |
F | 4 | 4 | 8 | 12 | |
Total |
The joint distribution of the 2 categorical variables is the proportion of total cases in a cell $$Joint Probability = \frac{Total In Cell}{Overall Total}$$ All the joint distributions should add to 1 (or 100%). For example: 18/306 = 0.0588 or 5.88% is the joint distribution for people with grade of A AND Class time One. Joint distributions use or imply “AND”. (i.e intersection)
Fill in the table of Joint distributions:
Grade | One | Two | Three | Four | Total |
---|---|---|---|---|---|
A | 5.88% | 3.92% | 16.99% | ||
B | 8.50% | 5.23% | 5.23% | 27.45% | |
C | 9.15% | 6.54% | 7.84% | 29.41 | |
D | 2.61% | 6.54% | 5.88% | 16.99% | |
F | 1.31% | 1.31% | 2.61% | 9.15% | |
Total | 24.84% | 24.84% | 25.49% | 24.84% | 100% |
The marginal distribution allows us to study 1 variable at a time. The marginal distributions of each categorical variable are obtained from row and column totals. Basically we are examining the distributions of a single variable in the two-way table. Marginal distributions allow us to compare the relative frequencies among the levels of a single categorical variable
Find the marginal distribution of Class Time for Example 2
One | Two | Three | Four | |
---|---|---|---|---|
Counts | ||||
Percents |
Find the marginal distribution of Letter Grade for Example 2
A | B | C | D | F | |
---|---|---|---|---|---|
Counts | |||||
Percents |
In conditional distributions, we find the distribution of one categorical variable given a common level of another categorical variable. Look for key words to indicate a conditional—“given”, “knowing”, etc.
Find the conditional distribution of Letter Grade for Class Time One
A | B | C | D | F | |
---|---|---|---|---|---|
Counts | |||||
Percents |
Find the conditional distribution of Class Time for Letter Grade C.
One | Two | Three | Four | |
---|---|---|---|---|
Counts | ||||
Percents |