Lecture 35

Qi Wang, Department of Statistics

Nov 14, 2018

Heights of Pokémon are Normally Distributed with a mean of 59 inches and a standard deviation of 17 inches.

- What is the standardized height (z-score) for Blastoise, who is 63 inches tall?
- Using the Normal Table, find the probability that any Pokémon is taller than Blastoise
- Knowing that a Pokémon is taller than Blastoise, what is the probability that it is taller than 70 inches?
- What height corresponds to the top 10% of Pokémon heights?

If a Binomial distribution has a large enough combination of n and p, it behaves much like a Normal distribution, which means we can use the Normal distribution to approximate the original Binomial distribution

- If $X \sim Bin(n, p)$, and $np > 5, n(1 - p) > 5$
- Then we can use $X^\star \sim N(\mu = np, \sigma = \sqrt{np(1-p)})$, to approximate $X$

You may notice that Binomial is Discrete, and Normal is Continuous. This means the approximation comes at a cost of accuracy that we must try to correct. When we use the approximation, we need to perform a continuity correction:

- If you’re looking for: $P(a \le X \le b)$
- Use $P(a - 0.5 < X^\star < b + 0.5)$

- Describes the relationship between two categorical variables.
- Represents a table of counts (can include percentages).

Examples

- Gender versus major
- Political party versus voting status

Sometimes one or both variables are quantitative, but we classify them into categories for data collection and/or analysis. For example, suppose our variables are years of college education and income. We decide to group years of education into four classes: none, some college, Bachelor’s degree, and post-graduate. We also decide to classify annual income in dollars into four classes: $<10,000, 10,000-30,000, 30,001-50,000,$ and $>50,000$.

An instructor taught four sections of a large statistics course and had the following distribution of grades when the semester was finished.

Grade | One | Two | Three | Four | Total |
---|---|---|---|---|---|

A | 12 | 18 | 10 | 12 | |

B | 26 | 26 | 16 | 16 | |

C | 28 | 20 | 24 | 18 | |

D | 6 | 8 | 20 | 18 | |

F | 4 | 4 | 8 | 12 | |

Total |

The **joint distribution** of the 2 categorical variables is the proportion of total cases in a cell
$$Joint Probability = \frac{Total In Cell}{Overall Total}$$
All the joint distributions should add to 1 (or 100%).
For example: 18/306 = 0.0588 or 5.88% is the joint distribution for people with grade of A AND Class time One. Joint distributions use or imply “AND”. (i.e intersection)

Fill in the table of Joint distributions:

Grade | One | Two | Three | Four | Total |
---|---|---|---|---|---|

A | 5.88% | 3.92% | 16.99% | ||

B | 8.50% | 5.23% | 5.23% | 27.45% | |

C | 9.15% | 6.54% | 7.84% | 29.41 | |

D | 2.61% | 6.54% | 5.88% | 16.99% | |

F | 1.31% | 1.31% | 2.61% | 9.15% | |

Total | 24.84% | 24.84% | 25.49% | 24.84% | 100% |

The **marginal distribution** allows us to study 1 variable at a time. The marginal distributions of each categorical variable are obtained from row and column totals. Basically we are examining the distributions of a single variable in the two-way table.
Marginal distributions allow us to compare the relative frequencies among the levels of a single categorical variable

- The marginals for the row variable should add to 1 (or 100%).
- The marginals for the column variable should add to 1 (or 100%)

Find the marginal distribution of Class Time for Example 2

One | Two | Three | Four | |
---|---|---|---|---|

Counts | ||||

Percents |

Find the marginal distribution of Letter Grade for Example 2

A | B | C | D | F | |
---|---|---|---|---|---|

Counts | |||||

Percents |

In **conditional distributions**, we find the distribution of one categorical variable given a common level of another categorical variable. Look for key words to indicate a conditional—“given”, “knowing”, etc.

Find the conditional distribution of Letter Grade for Class Time One

A | B | C | D | F | |
---|---|---|---|---|---|

Counts | |||||

Percents |

Find the conditional distribution of Class Time for Letter Grade C.

One | Two | Three | Four | |
---|---|---|---|---|

Counts | ||||

Percents |

- What percent of students in Class time Four earned a B? Is this joint, conditional or marginal?
- Of all students earning a B, what proportion were in Class time 4? Is this joint, conditional or marginal
- What percent of students were enrolled in Class Time 3? Is this joint, conditional or marginal?
- What proportion of students earned B’s and were in Class time 2? Is this joint, conditional or marginal?