Welcome to the website for
Bioinformatics II
(Spring 2003)
Meetings: Tue, Thur 2.303.45pm, 124 Ag Eg
Instructors:
Francesca Chiaromonte
411 Thomas Bldg, 57075, chiaro@stat.psu.edu
Off. Hours: Wed 12.001.00pm, Thur 10.0011.00am
Webb Miller
326A Pond Lab, 54551, webb@cse.psu.edu
Syllabus
Questions
Groups
Special announcements

Class is cancelled on Thur Jan 23rd. A reading assignment is given below.

On Thur Feb 13th, we will visit the microarray facility (meet in the elevator
lobby of Wartik Lab. at 2.30pm)
1. Introduction to Microarrays
Notes,
general introduction and spotted arrays
Affymetrix arrays:

Lipshutz R.J., Fodor S.P.A., Gingeras T.R., Lockhart D.J. (1999). High
density synthetic oligonucleotide arrays. Nature Genetics supplement, vol.
21, 2024. (description of affymetrix arrays)

Affymetrix, Microarray Suite version 5.0 (2002). Statistical Algorithms
Reference Guide (single array analysis)

Complete information: Statistical algorithms description document (available
at http://icg.cpmc.columbia.edu/Bioinformatics/MAS_5.pdf )
Replicating:

Lee M.L., Cuo F.C., Whitmore G.A., Sklar J. (2000). Importance of replication
in microarray gene expression studies: statistical methods and evidence
from repetitive cDNA hybridizations. PNAS vol 97, n. 18, 98349839.
Reading
List
Reading assignment: Naef F./Magnasco M. et al. (2001),
From features to expression: high density oligonucleotide arrays analysis
revisited. Write a summary and your comments in 2 pages (max), to hand
in on Thur Jan 30th (in class).
2. Preprocessing of Microarray data I: Normalization and Missing
Values
Notes
An instance of "global" normalization:

David B Finkelstein, Rob Ewing, Jeremy Gollub, Fredrik Sterky, Shauna Somerville,
J Michael Cherry (2002): "Iterative linear regression by sector", in Methods
of Microarray Data Analysis , eds. SM Lin, KF Johnson (Kluwer Academic),
pp. 5768 (Stanford preprint).
An instance of "controlbased" normalization:

AJ Hartemink, DK Gifford, TS Jaakkola, RA Young, "Maximum likelihood estimation
of optimal scaling factors for expression array normalization". SPIE BiOS
2001, San Jose, California (MIT preprint).
Missing values:

Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani,
R., Botstein, D., and Altman, R. B. (2001). Missing value estimation methods
for DNA microarrays. Bioinformatics 17, 520525 (available at http://ismb00.sdsc.edu/9_8_pdfs/SunMethods/Troyanskaya.pdf).

Guest lecture by Ofer Harel: TwoStage Multiple Imputation
(slides).
For more information on multiple imputation see http://www.multipleimputation.com
Reading
List
Reading assignment: (this concerns both module 2 and 3) Tsodikov
A., Szabo A., Jones D. (2002). Adjustments and measures of differential
expression for microarray data. Bioinformatics 18(2), 251260 (article).
Write a summary and your comments in 4 pages (max), to hand in on Thur
Mar 6th (in class). Work in groups.
3. Preprocessing of Microarray data II: Filtering, differentially
expressed genes, other Transformations
Notes

Dudoit, S., Yang, Y. H, Speed, T. P., and Callow, M. J. (2001). Statistical
methods for identifying differentially expressed genes in replicated cDNA
microarray experiments. Statistica Sinica. UC Berkeley Statistics tech.
report (postscript)

Efron B., Tibshirani R., Storey J.D., Tusher V. (2001). Empirical bayes
analysis of a microarray experiment. JASA 96(456) 11511160.
Reading
List
4. Dimension reduction and microarray data
Notes
A
Notes
B
Notes
C

Holter, N.S, Mitra, M., Maritan, A., Cieplak, M., Banavar, J.R., Fedoroff
N.V., (2000). Fundamental patterns underlying gene expression profiles:
Simplicity for complexity. PNAS v. 97, n. 15, 84098414.

Yeast Cell Cycle example (Minitab
file). Data from Spellman P.T., Sherlock G., Zhang M.Q., Vishwanath
R.I., Anders K.,
Eisen M.B., Brown P.O., Botstein D. (1998), Comprehensive Identification
of Cell Cycleregulated Genes of the Yeast Saccharomyces Cerevisiae by
Microarray Hybridization, Molecular Biology of the Cell, 9, 32733297.
CDC, 15 time points. 678 out of 800 genes (no missing values). 12:
restriction to 12 time points. c: use of correlation matrix. s: rowstandardization.

Notes on Multidimensional Scaling, from S. Holmes' website (pdf).
Reading
List
5. Clustering and microarray data
Notes
A
Notes
B
Notes
C

Yeung K.Y., Ruzzo W.L. (2001): Principal component analysis for clustering
gene expression data. Bioinformatics 17 (9) 762744.

Yeast Cell Cycle example (Minitab
file ) (plots).
Data from Spellman et al. CDC, restricted to first 12 time points, rowstandardized.

Some abstracts and papers on clustering (list
)

Bartolucci and Chiaromonte, draft on custering with multivariate normal
mixtures (ps).

BenHur A. Elisseeff A. and Guyon E. (2002) A stability based method for
discovering structure in clustered data. PSB 02 (ps).
Reading
List
Data analysis assignment: (this concerns both module 4 and 5).
Hand in on Tue Apr 22 (in class). Work in groups. Instructions.
Data (Excell
file).
Data analysis assignment: (for final discussion on Thur May
1). Working in groups, do one of the following
(a) Perform a perturbation analysis to select an appropriate number
of clusters (e.g. along the lines proposed by BenHur et al.) for
the Yeast Shock data  previous assignment.
(b) Perform an analysis of your choice (chose the data, preprocess
if needed, perform selection of differentially expressed genes,
dimension reduction, clustering, depending on the question you are
trying to address).
Relevant Links