## Introduction to Probability Models

Lecture 41

Qi Wang, Department of Statistics

Dec 5, 2018

### Reminders

• The final exam will be from 10:30am to 12:30pm, Dec 12, 2018 in CL50
• All review session will be at UNIV 101.
Monday, Dec 1, 2018 Tuesday, Dec 2, 2018
4:00pm - 5:30pm Will 10:30pm - 12:00pm Ce-Ce
5:30pm - 7:00pm Tim 12:00pm - 1:30pm Jiapeng
1:30pm - 3:00pm Yan
3:00pm - 4:30pm Qi
4:30pm - 6:00pm Will

### Final Exam

• Cumulative, about 70% will be on the material covered after Exam2
• 2-Hour Exam, 125 points
• You are allowed the following aids
• 2 one-page 8.5” x 11” HANDWRITTEN cheat sheets
• Scientific (non-graphing) calculator (in accordance with the syllabus)
• Pencils, pens, erasers

### Material Covered after Exam2

• Normal Distribution: definition, parameter, PDF, CDF, expected value, variance, standard Normal, Z-score, emprical rules, approximation to Binomial
• Five Number Summary and Boxplot
• Types of Data, summarizing Data and graphs
• Contigency Table and $\chi^2$ test
• Scatterplot, correlation and linear regression

### Normal Random Variable

• Parameter:
• $\mu$: the mean of the random variable, determines the center of the distribution
• $\sigma$: the standard deviation of the random variable, determines the shape of the distribution
• The standard normal distribution is the normal distribution with $\mu = 0, \sigma = 1$, namely, $X\sim N(\mu = 0, \sigma = 1)$
• The CDF of standard normal distribution is denoted as $\Phi(x)$
• You convert $X \sim N(\mu, \sigma)$ to $Z \sim N(\mu = 0, \sigma = 1)$, where $Z$ has the standard Normal distribution. Convert/standardize using: $$Z = \frac{X - \mu}{\sigma}$$ This standardized value is called a Z-score
• Remember that your table gives you the probability $$P(Z \le z) = \Phi(z)$$
• Steps to finding the sample score if you are given a probability and know $X \sim N(\mu, \sigma)$

1. Set up your problem as follows $P(Z \le z_0) = probability$(Note: adjust $>$to $\le$ if necessary by using “1-probability”.)
2. Find the z-score by looking up the probability in the body of normal table
3. If you have a two-sided probability, use $$P(-z_0 < Z \le z_0) = 2P(Z \le z_0) - 1 = 2 \Phi(z_0) - 1$$
4. Convert the z-score to x using $$z = \frac{x - \mu}{\sigma}$$

### Boxplot

Boxplot is a graphic depiction of the 5 number summary

1. Draw a horizontal or vertical axis that is evenly spaced and well-labeled(make sure it covers the full range of the data)
2. Locate $Q_1$ and $Q_3$. There are the "ends" of your box. Draw the box.
3. With the box, locate the Median and mark it
4. Locate and mark the Minimum and Maximum. Extend a line("whisker") from each end of the box to the Max or Min

To draw a modified boxplot, Step 1, 2, 3 are the same, BUT we indicate the outliers with a $o$ or a $\star$. Then draw the line from the ends of the box ot the highest or lowest data point that is NOT an outlier. Most software generate boxplots are modified boxplots.

### Contingency Table

• Describes the relationship between two categorical variables, represents a table of counts (can include percentages).
• Calculate joint, conditional marginal probability
• Test if there is a relationship between two qualitative (categorical) variables via Chi-Square($\chi^2$) Hypothesis test

1. State the Null and Alternative hypothesis
2. Determine the confidence level and the significance level $\alpha$
3. Find the test statistic $$\chi^2 = \sum{\frac{(\textit{observed cell count} - \textit{expected cell count})^2}{\textit{expected cell count} }}$$
4. Determine the degrees of freedom needed to use the $\chi^2$ table
5. Find the $\chi^2$ critical value from the $\chi^2$ table. Compare critical value from the table to the calculated $\chi^2$ value.
6. State the conclusion in terms of the problem

### Least-Squares Regression

• Minimizes $\sum_i^n e_i^2$
• Equation of the line is: $\hat{y} = b_0 + b_1 x$
• Slope of the line is:$b_1$, where the slope measures the amount of change caused in the response variable when the explanatory variable is increased by one unit.
• Intercept of the line is:$b_0$, where the intercept is the value of the response variable when the explanatory variable = 0. (i.e. value where line intersects the y-axis)
• Used for Prediction: using the line to find y-values corresponding to x-values that are within the range of your data x-values
• Using values outside range of the collected data can lead to extrapolation
• Coefficient of Determination: Denoted by $r^2$, it gives the proportion of the variance of the response variable that is predicted by the explanatory variable. So when $r^2$ is high, close to 1 or 100%, you have explained most of the variability. Also, it equals to the equare of the correlation between $x$ and $y$, $r^2 = r_{xy}^2$
• Residuals: the difference between the observed value of the response variable ($y$) and the predicted value ($\hat{y}$): residuals = observed y - predicted y, $$e = y - \hat{y}$$

### Material Covered before Exam2

• Refer to Lecture 15 for a summary of materials before Exam 1
• Refer to Lecture 21 for a summary of discrete random variables
• Refer to Lecture 28 for a summary of materials after Exam 1
• Discrete
• Bernoulli
• Binomial
• Hypergeometric
• Poisson
• Geometric
• Negative Binomial
• Continuous
• Uniform
• Exponential
• Normal