CSL 9B Activity
Sampling Distributions and the Central Limit Theorem

In Cyberstats Units B-11 and B-12, you learned that statistics, which are random variables also, have a sampling distribution. For example, the range of heights in samples of size 4 among adult women have a sampling distribution. You also learned that the sample mean has a sampling distribution, with very special properties:

                a.  The mean of the sampling distribution of averages in samples of size N is equal to the mean of 
                     the population from which the samples are drawn. That is to say,

m x = E(X) = m

                b. The standard deviation of the distribution of the sample mean is equal to the standard deviation
                     of the population from which samples are drawn divided by the square root of N, that is 

s x = s / o N

                c. As the size of the samples increases the sampling distribution of xbar becomes more nearly a     
                    normal distribution.

Today's activity is intended to illustrate properties a, b, and c.

The disk given to you contains data on X = 'Longest relationship' 209 students in Stat 200 said they had had. Specifically, the question asked of them was 'How long (in weeks) did your longest relationship last?'

        Column C1 contains the 209 (population) values in the Minitab Worksheet.
        Column C2 contains the averages of 1000 samples each of size N=4 from this population.
        Column C3 contains the averages of 1000 samples each of size N=16 from this population
        Column C4 contains the averages of 1000 samples each of size N=64 from this population.

Histograms of the data in the four columns are given on the last page of this activity.

  1. On the 3 x 5 card given to each of you, give your response to the question (try to do so anonymously) and compute the average for your group.
  2. Now use the data in the four columns for the following:

a. Draw one random sample of size N=4 for each member of your group from the population of 209 values and compute the average for each of your samples:
Click Calc, Random Data, Sample From Columns. Type in '4', 'C6', and click Sample with Replacement.

Averages for each group member: ______, ______, ______, ______, ______, ______.

This 'simulates' what you see in Column 2 (only there are 1000 averages in C2!).

  1. Describe the data in the 4 columns. You will obtain characteristics of the population (C1), and characteristics of the approximate sampling distributions of the mean for N=4, N=16, and N=64. (Click Stat, Basic Statistics, Display Descriptive Statistics) and then fill in the information requested below:

                        Sample    size  Mean     SD     Min      Q1     Median     Q3  Max

                    (Population) N=1 ____   _____  _____ _____ _____ _____ _____
                                       N=4  ____   _____  _____ _____ _____ _____ _____
                                       N=16 ____  _____  _____ _____ _____ _____ _____
                                       N=64 ____  _____  _____ _____ _____ _____ _____

     d. Does the mean change very much? Yes ___ No ___ What should the values be?

     e.  Describe the behavior of the SD. The value for N=1 is s = _____; what are the theoretical
          values for N=4: _______; N=16: _______; and N=64: ______?

          In words, how is the SD changing as N increases from 1 to 4 to 16 to 64?

    __________________________________________________________________
    __________________________________________________________________

    f.  Is the median 'constant'? Look at the value of the median for N=1 (the population median) and 
        then compare it with the values for N=4, 16, and 64 (the medians of the sampling distribution of 
        the mean).

     __________________________________________________________________
     __________________________________________________________________

    g.  Suppose you wanted to estimate the population median, for samples of size 16. How would you
         do it?.

         __________________________________________________________________
         __________________________________________________________________

    h.   Describe the behavior of the 7 numerical statistics (mean, SD, and the 5-number summaries) for 
          N=1, 4, 16, and 64. Which measures are changing:

    Measures of 'Center/Location' ? Yes ___ No ___; Measures of spread? Yes ___ No ___

    i.  Looking at the population probability histogram, does the distribution of X = 'Longest Relationship' 
        appear to be normal? Are there any 'mild outliers'? Extreme outliers? (note: mild outliers are values
        more than (1.5 )(IQR) from Q1 or Q3. Extreme outliers are values 3IQR'þ below Q1 or 
        above  Q3).    Normal?  Yes______  No_____

                        Mild Outliers? : Yes ___ No ___ Extreme Outliers? Yes ___ No ___

                        If yes, what are they? ___________ If yes, what are they? _____________

      j.  Does the distribution of X = mean of the distribution of averages for samples of size N appear to be 
          normal? N=4: Yes __ No __; N=16: Yes __ No __; N=64: Yes __ No __ k. Find the 2.5th and 
          97.5th percentiles of the distribution of xbar for samples of size 64. Note: the 1000 sample means 
         are sorted in order from smallest to largest.

                        2.5th percentile = _____. 97.5th percentile = _____.

  1. The Central Limit Theorem says that the distribution of xbar should be approximately normal for N=64. Using the theoretical mean and SD of the distribution of Xbar for samples of size 64, and Minitab, find the theoretical 2.5th and 97.5th percentiles of the distribution of Xbar.
  2. Suppose we wished to test the hypothesis H0 : m = 78.29 vs HA : m > 78.29. Suppose we observe xbar = 92.5. Using the approximate sampling distribution of xbar (in C4), what is the approximate P-value of the test?        Answer: P-value = _____
  3. Using the Central Limit Theorem, what is the p-value? Answer: P-value = _____