There is a dataset on the web that includes the results of many questions that were administered
to the students in three different sections of STAT 100 (on in spring 2004, one in spring 2005, and
one in spring 2005). The dataset is stored as a .csv file, which means that the columns are
separated by commas.
The dataset, called survey.csv, and another file
describing the variables, called survey.txt, are
in the datasets directory.
Read this dataset into R using the following line:
s = read.csv("http://www.stat.psu.edu/~dhunter/220/files/datasets/survey.csv", na.strings="")
Note: The na.strings="" argument is there so that any blanks are read in as NA.
Assuming this sample is representative of some population of interest, answer the following
questions about this population. Express your answers as formal hypothesis tests.
(That is, give hypotheses, calculate test statistics and p-values, and express your
conclusions in plain language.)
- Is the proportion of students with pierced ears different in the
spring semesters (SP04 and SP05) than in the fall semester (FA05)?
The relevant columns are "Class" and "Earprc".
- Is the mean GPA for nonsmokers different from that of smokers?
The relevant columns are "Cigpacks" and "GPA". The former gives the
weekly number of packs of cigarettes smoked, so you may want to
create a new TRUE/FALSE variable that tells whether each person is a smoker.