Stat 250 Biostatistics

Comprehensive Project Guidelines

Your comprehensive project consists of three parts.  In Part I, you are expected to analyze the data from the experiment we conducted in class.  In Part II, you are expected to analyze the results of the on-line observational study (survey) we conducted.  In Part III, you are expected to evaluate a scientific article that summarizes the results of a research study.  Part I is worth 20 points, Part II is worth 18 points, and Part III is worth 12 points, for a total of 50 points (or 10% of your grade).

 Deadline

Your project must be turned in by the end of class on Friday, December 03, 1999.  If you would like, you can turn in your project early.  Late projects, though, will not be accepted for any reason.  Students who do not turn the project in on time will receive a grade of zero.


Project Format and Administrative Details

You can submit your own project, or you can submit a group project.   If you choose to work in a group, your group must contain no more than 3 students.  If you submit a group project, you must complete a Group Project Form that outlines specifically what each group member did in the completion of the project.  While groups will initially receive the same group grade, adjustments may be made to individual grades based on the information provided on the Group Project Forms.

Your final project must be typewritten (double-spaced, all margins 1", 12 point font), and submitted on plain white, 8½ x 11 paper, stapled, in the following order:

Always start a new report on a new page.  In answering the questions within each report, number the questions as they are numbered in these guidelines (1, 2, ...).  In completing each of the reports, you are expected to write in full, grammatically-correct, sentences.  You will be graded based on the the accuracy of your analyses, the interpretation of the results, the appropriateness of your evaluation of the article, and the clarity of your communication.  Points will be deducted if you do not follow the directions outlined in these project guidelines.


Part I

We conducted an experiment in which we were interested in learning whether or not the information that a person is provided affects his/her estimate of the population of a country.  Each student in the experiment estimated both the population of Turkey and Canada on a data collection form.  However, the information provided to the students on the data collection form differed.  Half of the students received a green form, which contained:

1.  Do you think the population of Turkey is more than 80 million?   Yes _____ No _____
2.  To the nearest million, estimate the population of Turkey:  ______ million.
3.  The population of Australia is about 18 million.  To the nearest million, estimate the population of
Canada:  _____ million.

and half of the students received a blue form, which contained:

1.  Do you think the population of Turkey is more than 10 million?   Yes _____ No _____
2.  To the nearest million, estimate the population of Turkey:  ______ million.
3.  The population of Australia is about 18 million.  To the nearest million, estimate the population of
Canada:  ____ million.

So, before estimating the population of Turkey, half of the students saw the number "10 million" and half of the students saw the number "80 million."  And, before estimating the population of Canada, all of the students were told the population of "Australia is about 18 million."

The data from this experiment is in:

I:\STAT\250\fall99\lsimon\classdata\popn.mtw 

A description of the data in the worksheets follow:
 
Column name Description
Form 1 = Blue, 2 = Green
MoreThan Do you think the pop'n is more than? 1 = Yes, 0 = No
Turkey Student's guess of population of Turkey
Canada Student's guess of population of Canada
Class 1 = Fresh, 2 = Soph, 3 = Jun, 4 = Sen, 5 = Other
GPA Student's GPA
Geography Ever taken geography? 0 = No, 1 = Yes
EverBeen Ever been in Canada? 0 = No, 1 = Yes
 

Analyze the data for Part I as described below. 

1.  Using Minitab, create a graph (or graphs) so that you can visually compare the guesses of the population of Turkey for those completing the blue form compared to those completing the green form.  Make sure the graph is a type that allows you to identify outliers.

2.  Using Minitab, create a graph (or graphs) so that you can visually compare the guesses of the population of Canada for those completing the blue form compared to those completing the green form.  Make sure the graph is a type that allows you to identify outliers.

3.  Using Minitab, determine the average GPA for those completing the blue form, and determine the average GPA for those completing the green form. Perform a hypothesis test to see if the average GPA of students completing the green form differs from the average GPA of students completing the blue form.

4.  Using Minitab, determine the percentage of students having ever taken geography for those completing the blue form, and determine the percentage of students having ever taken geography for those completing the green form.  Perform a hypothesis test to see if the percentage of students ever taking geography on the blue form differs from the percentage of students ever taking geography on the green form.

5.  Using Minitab, determine the percentage of students having ever been in Canada for those completing the blue form, and determine the percentage of students having ever been in Canada for those completing the green form.  Perform a hypothesis test to see if the percentage of students having ever been in Canada on the blue form differs from the percentage of students having ever been in Canada on the green form.

6.   Using Minitab, perform a hypothesis test to see if the average guess of the population of Turkey for those completing the blue form is less than the average guess of the population of Turkey for those completing the green form.  At the same time, calculate a 95% confidence interval to estimate the true difference in the average guesses of the population of Turkey (for those completing the blue form compared to those completing the green form).

7.  Using Minitab, perform a hypothesis test to see if the average guess of the population of Canada for those completing the blue form differs from the average guess of the population of Canada for those completing the green form.  At the same time, calculate a 95% confidence interval to estimate the true difference in the average guesses of the population of Canada (for those completing the blue form compared to those completing the green form).

Now, write and submit a report for Part I:

1.  We randomly assigned which color form a student received by alternating green and blue forms within a row.  (There are better ways to randomize, but this method was the most efficient way of doing it during class time.)  The purpose of any randomization is to create groups that are, on average, the same in important characteristics, except for the primary "treatment" of interest.  If the randomization did work, we'd expect the groups we are comparing to have roughly the same GPA, the same percentage of students having had a geography course, and the same percentage of students having been in Canada.  In analysis items #3-5 above, you analyzed the data to check that indeed the randomization accomplished the goal of "balancing out" the two groups.  (There is always a small chance that it didn't balance things out.)

Cut out the output from #3-#5 and paste it to a page.  Then summarize the results in a table like the following:
 

Variable Blue Form Green Form p-value
GPA Average Average  observed p-value
Geography % %  observed p-value
Been to Canada % %  observed p-value

Write a brief paragraph that summarizes how well our randomization balanced things out.  In your summary:

2.  Cut out the output from #1-#2 and paste it to a page.  Write a brief paragraph that summarizes what you have learned from the graphs about the guesses, and how the guesses differ for the two forms.  Remember that graphs provide us insight into: In writing your paragraph, interpret the graphs with respect to each of these points.  Also, comment on the relevance of these points.

3.  Cut out the output from #6 and paste it to a page.  Write a brief paragraph that summarizes the results of your hypothesis test for Turkey.  In your paragraph:

4.  Cut out the output from #7 and paste it to a page.  Write a brief paragraph that summarizes the results of your hypothesis test for Canada.  In your paragraph: 5.  Write a brief conclusion.  In your paragraph:


Part II

You and your classmates have participated in an observational study by completing the recent survey data collection form B.  The data are stored in I:\STAT\250\fall99\lsimon\classdata\fa99_fmB.mtw
 
 A description of the data in the worksheet  follow:
 
Column name Description
Gender 1 = male, 2 = female
Age 1 = under 21, 2 = At least 21
Class 1 = lowerclassperson, 2 = upperclassperson
Residence 1 = in PA, 2 = Out of PA
Greek 1 = yes, 0 = no
Engaged 1 = yes, 0 = no
Cat 1 = yes, 0 = no
Chew 1 = yes, 0 = no
Cow 1 = yes, 0 = no
Streaked 1 = yes, 0 = no
Save 1 = spouse, 2 = mother
Twins 1 = one with car, 2 = one without car
Burn 1 = o chem book, 2 = stat 250 notes
Procrast 1 = yes, 0 = no
Lottery 1 = yes, 0 = no
Deprived 1 = yes, 0 = no
Deerpens 1 = yes, 0 = no
Passedout 1 = yes, 0 = no
Roommate 1 = yes, 0 = no
Married 1 = yes, 0 = no
Opposite 1 = yes, 0 = no
Body 1 = yes, 0 = no
Fakeid 1 = yes, 0 = no
So 1 = yes, 0 = no
Long 1 = yes, 0 = no
Makebed 1 = yes, 0 = no
Sheets 1 = yes, 0 = no
Nude 1 = yes, 0 = no
Watch 1 = yes, 0 = no
Surgery 1 = yes, 0 = no
Bags 1 = paper, 2 = plastic
Bungee 1 = yes, 0 = no
Meat 1 = chicken, 2 = beef
Fight 1 = yes, 0 = no
Exercise 1 = yes, 0 = no
Run 1 = yes, 0 = no
Horoscope 1 = yes, 0 = no
Tire 1 = yes, 0 = no
Head 1 = yes, 0 = no
Vote 1 = yes, 0 = no
Milk 1 = yes, 0 = no
Adopted 1 = yes, 0 = no
Pornmag 1 = yes, 0 = no
Endworld 1 = yes, 0 = no
Utensil 1 = pencil, 2 = pen
Varsity 1 = yes, 0 = no
Sneakers Number of pairs of sneakers owned
Cards Number of credit cards in your name
Cars Number of cars you own
Drink Number of days a month you drink alcohol
Cry Number of days you cried in past month
Firstalc Age at which first consumed alcohol
Arrested Number of times been arrested
Skipped Number of Stat 250 classes skipped
Coffee Number of 8-oz cups of coffee daily
Urinate Number of times you urinate in a 24-hour period
Study Number of hours studied for second Stat 250 midterm
Santa Age learned no Santa Claus
Jewelry Number of pieces of jewelry currently wearing
Countries Number of countries visited
Games Number of PSU football games attended this season
Books Number of books read for pleasure this semester
Sleephours Hours of sleep in typical day at Penn State
Home Number of times gone home this semester
Cones Number of Creamery ice cream cones in month
Email Number of times check e-mail in a day
Account Amount of money in bank account
Food Amount of money typically spent on food weekly
Loop Number of times ride Loop weekly
Mirror Number of times look in mirror daily
Bike Number of times cut off my someone on bike in a day
Parktix Number of parking tickets ever had
Breath How long can hold breath in seconds
Waist Waist size in inches
Haircut Number of times get haircut in semester
Cheated Number of times cheated on a significant other
Virgin 1 = yes, 0 = no
Active 1 = yes, 0 = no
Partners Number of sexual partners in lifetime
Sex Number of times had sex this week
Firstsex Age at first sexual encounter
 

It is your job to analyze the data and write a report for Part II as follows:

1.  Select one binary variable of interest to you.  Using Minitab, calculate a 95% confidence interval for the true percentage of students having the characteristic.  Cut out the Minitab output and paste it to your report page.  Write a brief paragraph that summarizes the confidence interval and what you have learned.  Based on what you know about Penn State students, did you expect the result you got?

2.  Select one measurement variable of interest to you.  Using Minitab, calculate a 95% confidence interval for the true average of the measurement variable.  Cut out the Minitab output and paste it to your report page.  Write a brief paragraph that summarizes the confidence interval and what you have learned.  Based on what you know about Penn State students, did you expect the result you got?

3.  Select a binary grouping variable on the data collection form.  Then, select another binary response variable of interest to you.  The goal is to see if the groups differ with respect to the binary response variable.  That is, you want to compare the percentages having the trait in each of the two groups.  Determine your null and alternative hypotheses, and perform a hypothesis test to see if the groups do indeed differ significantly.  Cut out your Minitab output and paste it to your report.  Write a brief paragraph that summarizes the results of your hypothesis test.  In so doing, specify your original hypotheses, state the significance level you used, and specify your decision whether or not to reject.   Based on what you know about Penn State students, did you expect the result you got?

4.  Select a binary grouping variable on the data collection form.  Then, select one measurement response variable of interest to you.  The goal is to see if the groups differ with respect to the measurement response variable.  That is, you want to compare the averages of the two groups.  Determine your null and alternative hypotheses, and perform a hypothesis test to see if the groups do indeed differ significantly.  Cut out your Minitab output and paste it to your report.  Write a brief paragraph that summarizes the results of your hypothesis test.  In so doing, specify your original hypotheses, state the significance level you used, and specify your decision whether or not to reject.   Based on what you know about Penn State students, did you expect the result you got?


Part III

Read the scientific article, "Lack of Effectiveness of Bed Rest for Sciatica."  Then, answer the following questions in full sentences.    Five points will be deducted if you do not write in full sentences.

1.  What type of study did the investigators conduct? 

2.  Why did the authors conduct this study? 

3.  What are the two treatments being compared in this study?   How many patients received each treatment?   And how did a patient get assigned to a particular treatment?

4.  To which population can the results of this experiment be extended?  Be as specific as possible.  (In general, study eligibility criteria are used to define the populations.)

5.  The authors compared the 2 groups with respect to several outcome measures.  Identify one of the binary outcome variables, and identify one of the measurement variables.

6.  What is the purpose of Table 1?  How did the authors get such similar results in the two groups?

7.   Based on the ages reported in Table 1, calculate a 95% confidence interval for the average age for each group.  Using the confidence intervals, draw a conclusion about the similarity or difference in the average age in the two groups.

8.  What differences are there in the average satisfaction scores at 12 weeks between the two populations?  Be as specific and as statistical as possible in your conclusions.

9.  Based on the results presented in Table 3, what conclusions can the authors make?  Is there evidence that bed rest is an effective treatment for patients with sciatica? Explain.

10.  Were the patients successfully blinded to their treatment assignment?  Explain.  

Please note that you should make sure that you read and study the entire article.  You may see part of it again on the final exam.