STAT 220: Basic Statistics for Quantitative Students

Spring 2006

Assignment due Feb. 10

Type or write your answers to the following questions to turn in on Feb. 10 in class. A couple of these questions have answers in the back of the book. Feel free to use the answers to guide you if you wish; however, keep in mind that you will be assumed to know how to do all of these questions on an exam.

  1. Do Exercises 12.56, 12.57, 12.59, 12.65, 12.67, 12.71, and 12.74. For 12.71, use the unpooled variances. Finally, here are some R hints to help you:
    # Confession:  There's actually an easier way to change all the *s to NAs that I haven't
    # mentioned before now because I wanted you all to learn how to save files as text before
    # reading them into R.  Sorry.
    
    # Try the following.  Note the use of the "na.strings" argument to read.table:
    ps1=read.table("http://www.stat.psu.edu/~dhunter/220/files/datasets/ascii/pennstate1.txt",
        header=T, na.strings="*")
    
    # If you want to see what other arguments read.table has, type '?read.table'
    
    # Now suppose that you want to get the mean for the "Fastest" column, which contains an NA.
    # you can use mean with na.rm=TRUE (the default is FALSE, for some reason):
    mean (ps1[,"Fastest"], na.rm=T)
    # You can do the same thing with the 'var' function to find the sample variance (which may
    # then be used to find the st dev).
    sqrt(var(ps1[,"Fastest"], na.rm=T))
    
    # Suppose you want the mean or st dev for just the males.  Try this:
    attach(ps1)
    mean(Fastest[Sex=="Male"], na.rm=T)
    
    # Finally, don't forget about the 'table' command, which will be useful for Exercise 12.74.
    
  2. Read Chapters 2 and 3 of the R Companion. Exercises C and D below, which are based on exercises written by Daniel Kaplan, draw on this material.
  3. The figure shows part of a page from Francis Galton's notebooks. In the late 1880s, Galton collected and analyzed data from several families, including the heights of the father and mother and of their children.

    Rearrange the data shown in this page excerpt in a tabular format with each child being one case. As a first step, decide what variables are contained in Galton's data. Keep in mind that while some of the variables are recorded explicitly as a number, others are recorded implicitly as position in Galton's records. Note that the data are coded as the height in inches minus 60 --- so add 60 to each number to recover the true height. Create a text file with these data in a tabular format that could be read into R, then print out the result to turn in. You are welcome to use either commas or spaces to separate the data columns, but please in either case make sure that the top row of your file contains the variable names.

    Source for figure: James A. Hanley, McGill University. Hanley, James A., "Transmuting" Women into Men: Galton's Family Data on Human Stature, See The American Statistician, Volume 58, Number 3, 1 August 2004, pp. 237-243(7)

  4. Here are some data about baseball in a compact but non-standard format. As in the previous exercise, rearrange these data in a tabular format with each player being one case.
    PlayerTeamAgeSalary
    Pitchers
    Osuna, AntonioDodgers26 1050
    Pettitte, Andy Yankees 26 5950
    Outfielders
    Dunwoody, Todd Marlins 24 222
    Sosa, Sammy Cubs 30 9000
    Create a text file with these data in a tabular format that could be read into R, then print out the result to turn in. If you wish, you may combine this file with the previous exercise so that you only have to print one page to turn in (however, if you do this, make sure to leave several blank lines between the separate "files" so they do not run together).
As always, email me if you have questions.