Lecture 32

Qi Wang, Department of Statistics

Nov 7, 2018

__Probability__is the science which studies likeliness of future event is based on the underlying model or population or process__Statistics__is the science of collecting, analyzing, presenting, and interpreting data

**Dataset:**all the data collected in a particular study**Elements:**the individual entities of a data set**Variable:**a character of interest for the elements**Observation:**the set of measurements obtained for a particular element

- Qualitative Variable: has names or labels used to identify an attribute of an element, also called categorical.
Scale of measurement is
- Nomial: ranking or order is meaningless (e.g. color)
- Ordinal: inherent rank or order to the data (e.g. income categories)

- Quantitative Variable: has numeric values that indicates how much or how many of something.
Scale of measurement in
- Interval: difference of quantities are meaningful, ratios of quantities cannot be compared (e.g. temperature in degrees Fahrenheit)
- Ratio: ratios of quantities are meaningful (e.g. length)

The table shows a data set.

Year | Major | GPA | Total Credit Hours |
---|---|---|---|

Sophomore | Psychology | 3.14 | 42 |

Senior | Accounting | 3.45 | 105 |

Senior | Philosophy | 3.06 | 111 |

Freshman | Statistics | 2.89 | 17 |

Sophomore | Spanish | 3.25 | 38 |

Junior | Accounting | 2.95 | 79 |

- How many elements are in the data set?
- How many variables are in the data set?
- What is the $4_{th}$ observation in the data set?
- What type of variable is each variable in the data set?

State the type of variable for each

- Smoking status
- Income
- IQ
- Level of satisfaction
- T-shirt size (S, M, L, XL)
- Score on the Mathematics portion of the SAT

Based on how the data were collected

- Cross-sectional data: collected at the same point (or approximately the same point) in time
- Time series data: collected over several time periods

State whether the data for variable is cross-sectional or time series.

- Current GPAs of Sophomore Management students
- Your GPA during your time at Purdue
- Daily closing price of a stock for month of March
- Value of the stocks in the Dow Jones Industrial Average on March 31, 2016
- Bushels of corn harvested in each of Indiana’s counties in 2015.
- Score on the Mathematics portion of the SAT

**Mean:**arithmetic average $$\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n} = \frac{1}{n}\sum_{i}^n x_i$$ Example: 1, 2, 2, 3, 4, 7, 9-
**Mode:**most frequent value in a dataset Example: 2 is the mode in the previous example -
**Median:**midpoint of the data such that half of the values are smaller and half of the values are larger.

- Arrange the data in increasing order(from the smallest to the largest)
- Count the number of observations, n.
- If n is
**odd**, median is the middle ordered value: $M = (\frac{n + 1}{2})_{th}$ ordered value - If n is
**even**, median is the average of two middle ordered value: $M$ = average of $(\frac{n}{2})_{th}$ and $(\frac{n}{2} + 1)_{th}$ ordered value