Statistics is the science of collecting, analyzing, and using data.
Why collect data?
- To make estimates.
- To make comparisons.
- To study relationships.
Whom to study?
- Population: A collection of all possible objects (people, trees,
fish, college students, etc.) Since we're really interested in the measurements
of the objects (height, weight, smoking habits, drinking habits, etc.), population
is sometimes defined as a collection of all possible measurements.
- Sample: A subset of a population.
How many to study?
The number of individuals in the sample is called the sample size. We'll discuss
how sample size affects conclusions throughout the course.
Data collection methods
- Census: In a census, data are collected from every member of a population.
- Experiment: In an experiment, a researcher assigns a specific procedure
to experimental units in order to examine the effect of that procedure.
- Randomized experiment: an experiment in which participants are randomly
divided into treatment groups. Most randomized experiments are "randomized
comparative experiments" in which the groups are compared, so the word "comparative"
is frequently omitted.
- Observational study: In an observational study, the investigator
compares groups that already exist.
- Sample survey: a sampling of the opinions or characteristics of a
representative portion of a population.
Types of good (representative) samples
- Simple random sample: Individuals are randomly selected from a population.
The desired sample size is chosen, then every possible combination of that
many population members is equally likely to be the sample. Sometimes just
referred to as a random sample.
- Stratified random sample: A stratified random sample is selected
by dividing the population into groups called strata, then selecting a simple
random sample from each of the strata. The probability of selection is the
same for each individual in a given stratum, but may differ across the strata.
Types of bad (unrepresentative) samples
- Biased sample: A sample that is not representative of the population.
Some members have a greater chance of being selected than others.
- Convenience sample: Typically not representative of the population,
the members are selected based on convenience.
- Volunteer sample: Members volunteer to be selected for the sample.