Review Topics for Mid-term: Basic Concepts and things to Know
- Basic ideas of statistical inference
- Types of hypotheses (research, null, and alternative)
- Type I and Type II errors
- Rejection regions (ex. Reject H0 : p = p0 vs the
alternative Ha : p < p0 if the sample proportion
if much smaller than p0 .
- P-values: Definition and how used (for example, reject H0 : p
= p0 vs the alternative Ha : p < p0 if
the p-value is less than .05. The p-value here is the probability of
getting the sample proportion you got or an even smaller one).
- Specification of null and alternative hypotheses for one proportion, two
proportions, three or more proportions, one mean, two means, three or more
means, in terms of symbols. Ex. Test about a population proportion: H0
: p = p0 Ha : p < p0 . the symbol here
is p (for a proportion). For two means, we would use : H0 : m
1= m 2 Ha : m
1¹ m 2
, a two-sided alternative with the rejection region being reject H0
if the difference between the two sample means is either large negatively
or large positively; the symbol here is m with
subscripts to denote population means)
- Confidence Intervals: their components and how they are put together,
margin of error, confidence level
- Deciding whether the test is about proportions or means (look at the
response variable: if it is categorical, it is about proportions; if it is
numerical (quantitative), it is about means).
Suggested materials to look at: Handout on Testing
Statistical Hypotheses (worked on in week 2), Cyberstats C-1, Project I.
2. Numerical and graphical methods for describing data
- One categorical variable: numerical description is a count and a sample
proportion. Graphical displays include Pie charts, Charts.
- Two categorical variables: numerical summaries include tallies for each
variable and a cross-tab of both (with explanatory variable forming rows
of the table). Graphical displays include side-by-side pie charts of each
variable separately, using percents.
- Numerical (quantitative) variables: numerical descriptions include the
sample mean and standard deviation, five-number summary, percentiles, and
interquartile range (IQR). Graphical displays include histograms,
stem-and-leaf diagrams, and boxplots.
- Measures of location (center): mean and median.
- Measures of variability/spread: standard deviation, range, interquartile
range and how they are calculated..
- Measures of position: percentiles
3. Random Variables, Probability Distributions, and Expected Values
- Definition of 'random variable'
- Definition of 'Probability Distribution' and how to answer questions
about the probability of values of the random variable. Definition of
'cumulative probability distribution' and what it gives.
- Relation between a 'population' and a 'probability distribution'
(probability distribution summarizes the values in the population).
- Mean of a population/probability distribution/random variable (three
equivalent concepts).
- How the mean of a probability distribution is calculated [m
= å x p(x)].
- How the population variance s 2 is
calculated [s 2 = å
x2 p(x)] - m 2 ];
population standard deviation s = square root
of population variance.
- Difference between a statistic and a parameter.
- Difference between the sample mean and the population mean and the
difference between the sample standard deviation s and the population
standard s .
- The binomial distribution: what it describes, its mean and standard
deviation.
- The normal distribution: types of random variables it often describes.
4. Types of variables
- Categorical
- Numerical (quantitative)
It would probably be helpful to review all of the RAT's,
Homework 2, and the CSL activities (especially CSL 4A, 4B, 5A, and Project I).
We will have two more working activities in the labs: one on the binomial and
normal distributions, and a second on sampling distributions (which I will not
hold you responsible for on the Mid-term).