Assignment due Jan. 27
You might find the following snippet of R code helpful in carrying out this part of the assignment. Feel free to copy it into a word processor, change it as needed for your own purposes, and then copy and paste each line into R to see what happens.
## This code reads in the GSS-93 dataset and then looks at the cappun variable
setwd("220/datasets") # Don't forget to change to the right directory (yours will be different)
dir() # Just to check that the file is there
gss = read.table("GSS-93.txt", header=TRUE) # Read the entire dataset into the object gss
cap = gss[,"cappun"] # create a new object from the "cappun" column of gss
length(which(cap=="Favor")) # count number of "Favor" answers (note: double-equals is important)
# Now get 200 samples of size 60, calculate the sample proportion for each:
capprops = rep(0, 200) # Initialize a vector with 100 zeros. This will hold the answers.
for (i in 1:200) { # The variable i will count from 1 to 200 as we do the following stuff:
s = sample(cap, 60) # Select sample of size 60
favnum = length(which(s == "Favor")) # Count number in favor from sample
capprops[i] = favnum/60 # calculate sample proportion, save as ith place in capprops
}
# Now draw a histogram of the 200 sample proportions contained in the vector capprops:
hist(capprops, xlab = "Proportion", main = "In favor of capital punishment")
# Maybe I'd like to make a pdf file of the histogram to print out and hand in:
pdf(file="cappunhist.pdf") # Open the pdf file, ready to accept graphics
hist(capprops, xlab = "Proportion", main = "In favor of capital punishment")
dev.off() # This will finish the pdf file. Note that hist command was same as above.
Jan. 24 Addendum:
Norman pointed out that the code I've given above, which counts the NA values as responses, might not be the best way to proceed. We don't know much about what NA means for this survey, but I'm inclined to agree (if NA means the subject had no opinion, then the NAs should probably be included; however, if NA means the subject simply chose not to answer but may have an opinion, then they probably should not).
I don't care which way you proceed; I'll leave it up to you to decide which is more appropriate. If you want to get rid of the NA values in, say, the cap vector above, then just insert the following line immediately after the line 'cap = gss[,"cappun"]':
cap = cap[!is.na(cap)] # the exclamation point is negation, so !is.na means 'is not NA'With this line inserted, cap will have only 1488 entries instead of 1606.