Welcome to the web-site for
Bioinformatics II, Spring 2004.
Meetings: Tue, Thur 2.30-3.45pm, 210 Thomas Bldg.
Off. Hours: Tue, Thur 1.20-2.20pm, 505 Wartik Bldg.
Instructor: Francesca Chiaromonte. 505 Wartik Bldg, 5-7075, firstname.lastname@example.org
Also: Webb Miller. 501 Wartik Bldg, 5-4551, email@example.com
Syllabus | Questions | Groups
Some useful links and news
T. Speed's group | G. Churchill's group | Stanford Stat's group | W. Li's bibliographic reference list |
Info on multiple imputation methods |
Penn State Microarray Facility |
Penn State Quantitative Bioscience Group |
Penn State Computational Genomics Journal Club (514 Wartik, 10.00-11.30am):
Tue Jan 27. Liying Cui presents: K. Vandepoele, C. Simillion and Y. Van de Peer (2003).
"Evidence that rice and
other cereals are ancient aneuploids." Plant Cell 15, 2192-2202.
Tue Feb 17. Istvan Albert presents: LionDB - the Penn State microarray data repository.
Tue Mar. 16, RESCHEDULED. Naomi Altman presents: Analysis of microarray data
(spotted and Affy) with R and Bioconductor (freeware for Windows, Mac and Unix).
Word files provided by Naomi: GettingStarted, Marraydemo, Affydemo.
Talk, Statistics Department: Thur April 8, 4:00 pm, 102 Thomas Bldg: "Sharper Confidence Intervals Focusing on the Selected Populations with Application to Microarry Data Analysis - A New Approach", Gene Hwang, Cornell University.
Talk, Statistics Department: Tue April 13, 4:00 pm, 102 Thomas Bldg: "Statistical learning from distributions of DNA words", Probal Chaudhuri, Indian Statistical Institute.
Class is cancelled on Thur Jan 15. Reading assignment: N. Goodman (2003). "Microarrays: Hazardous to your science". Genome Technology 04/03, pp 42-45.
Visit to the Microarray Facility. RESCHEDULED FOR TUE FEB 10. Meet in front of 205 Wartik at 2.30pm. Craig Praul will divide us in two groups and show us the facility.
Statistical Analysis of Gene Expression Microarray Data (2003). T. Speed (ed.). Chapman & Hall.
The Analysis of Gene Expression Data: Methods and Software (2003). G. Parmigiani, E. Garrett, R.A. Irizarry, S.L. Zeger. (eds). Springer NY.
Textbooks on Regression methods and related topics:
Applied Regression Including Computing and Graphics, R.D. Cook and S. Weisberg. Wiley NY.
Applied Regression Analysis, N.R. Draper and H. Smith. Wiley NY.
Textbooks on Multivariate Analysis:
Methods for Statistical Data Analysis of Multivariate Observations (1997, 2nd ed). Gnanadesikan. Wiley NY.
Multivariate Observations (1984). Seber. Wiley NY.
Clustering Algorithms (1975). Hartigan. Wiley NY.
Self Organizing Maps (1997, 2nd ed). Kohonen. Springer-Verlag.
1. Introduction to Microarrays
Replicating: Lee M.L., Cuo F.C., Whitmore G.A., Sklar J. (2000). "Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations". PNAS 97(18): 9834-9839.
Affymetrix single array analysis (hints): "Statistical Algorithms Reference Guide". Microarray Suite version 5.0 (2002).
2. Experimental Design and ANOVA for Microarrays
Guest lecture by Jim Rosenberger.
3. Data Preprocessing: Normalization, Missing Values, Preliminary Transfo's and Filtering
Notes, normalization and missing values
Normalization: Yang Y.H., Dudoit S., Luu P., Speed T. (2001). "Normalization for cDNA microarray data". SPIE BiOS 2001, San Jose CA.
Notes, lowess smoothing.
Missing values: Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B. (2001). "Missing value estimation methods for DNA microarrays". Bioinformatics 17: 520-525.
Notes, preliminary transformations and filtering
Instructions. Due date: Tue March 2nd, in class.
• Sidorov I.A., Hasack D.A., Gee D., Yang J., Cam M.C., Lempicki R.A., and Dimitrov D.S. (2002). Oligonucleotide micorarray data distribution and normalization. Information Sciences 146: 67-73.
• Bolstad, B.M., Irizarry R. A., Astrand, M., and Speed, T.P. (2003). A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics 19(2): 185-193.
4. Differentially Expressed Genes
Dudoit, S., Yang, Y.H., Speed, T.P., and Callow, M.J. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 12(1):111-139.
Reading assignment: Instructions. Due date: Thur March 18th, in class.
• Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS. 98:5116-5121.
• Efron B., Tibshirani, R., Storey J.D., and Tusher V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association. 96:1151-1160.
5. Dimension Reduction for Microarray Data
Holter N.S., Mitra M., Maritan A., Cieplak M., Banavar J.R., and Fedoroff N.V. (2000). Fundamental patterns underlying gene expression profiles: Simplicity from complexity. PNAS. 97: 8409-8414.
Alter O., Brown P.O., and Botstein D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. PNAS. 97: 10101-10106.
Data assignment: Instructions. Data set. Due date: Tue April 6th, in class.
6. Clustering for Microarray data
Notes A (hierarchical clustering and k-means)
Notes B (visualization, principal components and example)
Ben-Hur A. Elisseeff A. and Guyon E. (2002). A stability based method for discovering structure in clustered data. Pac. Symp. Biocomputing 2002.
Dudoit S., Fridlyand J. (2002). A prediction-based resampling method to estimate the number of clusters in a dataset. Genome Biology 3(7): 0036.
Notes C (mixture-based clustering and examples)
Yeung K.Y., Fraley C., Murua A., Raftery A.E. and Ruzzo W. L. (2001). Model-Based Clustering and Data Transformation for Gene Expression Data, Bioinformatics 17(10): 977-987.
Data assignment: Instructions. Data set. Due date: Tue April 20th, in class.
7. Combining gene expression data with other types of information; Gene networks (hints)
Tamada Y., Kim S.Y., Bannai H., Imoto S. Tashiro K., Kuhara S. and Miyano S. (2003) Estimating gene networks from gene expression data by combining Bayesian network models with promoter element detection. Bioinformatics 19(2):227-236.
(limited) Reading List.
Each group should prepare a presentation lasting approximately 20-25 minutes. All group members should be involved in describing the work (i.e. take turns in speaking) and be ready to answer questions. A hard copy of the presentation (or if you want an extended description of what you did) should be handed in to Francesca right before you talk. If you want the pdf file of your presentation to be posted on the class web-site, email it to Francesca the evening before your presentation date.
Schedule for presentations:
Tue April 27th:
Group 3 (Yi-Ju Chen, Srivatsava Ganta, Minmei Hou, Samir Wadhawan). Analysis of Leukemia data set from Golub et al. (1999).
Group 4 (Bob Harris, Jieun Jeong, Xiantling Wu) Analysis of Leukemia data set from Golub et al. (1999). Info on Genetic Algorithms provided by Bob: Textbook Genetic Algorithms in Search, Optimization, and Machine Learning, David E. Goldberg; links Intro to GAs, Overview and references on GAs.
Group 5 (Wen-Yu Chung, Anusha Radakrishnan, Ying Zhang).
Human microarray data from 25 normal tissues from Su et al. (2002)
Thur April 29th:
Group 1 (Kevin Beckman, Eren Manavoglu, James Taylor). Analysis of human cell cycle data.
Group 2 (Baomin Feng, Jian Ma, P. Kerr Wall)