STAT557/IST557: Data Mining
Course Material
Lecture notes
-
Introduction
-
Linear regression
-
Linear Methods for classification (regression of indicator matrix)
-
Linear discriminant analysis
-
Regularized discriminant analysis, reduced rank LDA
-
Logistic regression
-
The perceptron learning algorithm
-
K-means (prototype method)
-
Clustering methods (K-center, dendrogram)
-
LVQ and k-nearest-neighbor
-
Classification and Regression Trees (I)
-
Classification and Regression Trees (II)
-
Brief introduction to bagging and boosting
-
Mixture Model
-
Mixture discriminant analysis
-
Hidden Markov models
Survey of Special Topics
-
Random forest
-
Support vector machine
-
Nonlinear dimension reduction, manifold learning
-
Nonparametric density estimation
-
Spectral graph partitioning
-
Mode-based clustering
-
D2-clustering
-
Markov random field, 2-D (Spatial) Hidden Markov Model
Recommended reading for survey
(search papers at
google scholar ):
-
L. Breiman, "Random forests", Machine Learning , 45(1):5-32,2001.
-
V. N. Vapnik, The Nature of Statistical Learning Theory, Springer,
2000.
-
C. J. C. Burges, "A tutorial on support vector machines for pattern
recognition," Data Mining and Knowledge Discovery, 2(2):121-167,
1998.
-
N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector
Machines, 2000.
-
J. B. Tenenbaum, V. de Silva, J. C. Langford, "A global geometric framework
for nonlinear dimensionality reduction," Science,
290(5500):2319-2323, 2000.
-
S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally
linear embedding," Science , 290(5500):2323-2326, December 2000.
-
H. Zha and Z. Zhang, "Continuum ISOMAP for manifold
learning," Computational Statistics and Data Analysis ,
52:184--200, 2007.
-
L. Devroye and L. Gyorfi, Nonparametric density estimation, Wiley
Series in Probability and Mathematical Statistics, 1985.
-
A. J. Izenman, "Recent developments in nonparametric density estimation,"
Journal of the American Statistical Association, 86(413):205-224, 1991.
-
J. Shi and J. Malik, "Normalized cuts and image segmentation,"
IEEE Trans. PAMI , 22(8):888-905, 2000.
-
H. Zha, X. He, C. Ding, H. Simon, M. Gu, "Bipartite graph partitioning
and data clustering," Proc. 10th Int.
Conf. Information and Knowledge Management , 25-32, 2001.
-
I.S. Dhillon, "Co-clustering documents and words using bipartite spectral
graph partitioning," Proc. 7th ACM SIGKDD , 269-274, 2001.
-
A. Y. Ng, M. I. Jordan, Y. Weiss, "On spectral clustering: analysis and an
algorithm," Advances in Neural Information Processing Systems ,
849-856, 2002.
-
Y. Leung, J.-S. Zhang, and Z.-B. Xu, "Clustering by scale-space
filtering," IEEE Trans. PAMI,
22(12):1396-1410, 2000.
-
J. Li, S. Ray, B. G. Lindsay, "A nonparametric statistical approach to
clustering via mode identification," Journal of Machine Learning
Research, 8(8):1687-1723, 2007.
-
M. C. Minnotte and D. W. Scott, "The mode tree: A tool for visualization of
nonparametric density features," Journal of Computational and Graphical
Statistics, 2(1):51-68, 1993.
-
J. Li, J. Z. Wang, "Real-time computerized annotation of pictures,"
IEEE Trans. PAMI , 30(6):985-1002, 2008.
-
R. Chellappa and A. Jain, Markov Random Fields: Theory and Application
, 1993.
-
B. S. Manjunath, R. Chellappa, "Unsupervised texture segmentation using
Markov random field models," IEEE Trans. PAMI, 13(5):478-482, 1991.
-
J. Li, A. Najmi, R. M. Gray, "Image classification by a two
dimensional hidden Markov model,"
IEEE Trans. on Signal Processing , 48(2):517-33, 2000.
-
J. Li, R. M. Gray, R. A. Olshen, "Multiresolution image
classification by hierarchical modeling with two dimensional hidden Markov
models," IEEE Transactions on Information Theory ,
46(5):1826-41, 2000.
Projects and Survey
-
Project 1.
Due on Oct. 7, 2010.
-
Project 2.
Due on Oct. 28, 2010.
-
Project 3.
Due on Nov. 30, 2010
-
Project 4.
Due on Dec. 7, 2010
-
Survey Due on Dec 14, 2010
Data
Jia Li
Last modified: Tue August 4 11:04:21 EDT 2010