STAT557/IST557: Data Mining
Course Material


Lecture notes
  1. Introduction
  2. Linear regression
  3. Linear Methods for classification (regression of indicator matrix)
  4. Linear discriminant analysis
  5. Regularized discriminant analysis, reduced rank LDA
  6. Logistic regression
  7. The perceptron learning algorithm
  8. K-means (prototype method)
  9. Clustering methods (K-center, dendrogram)
  10. LVQ and k-nearest-neighbor
  11. Classification and Regression Trees (I)
  12. Classification and Regression Trees (II)
  13. Brief introduction to bagging and boosting
  14. Mixture Model
  15. Mixture discriminant analysis
  16. Hidden Markov models



Survey of Special Topics
  1. Random forest
  2. Support vector machine
  3. Nonlinear dimension reduction, manifold learning
  4. Nonparametric density estimation
  5. Spectral graph partitioning
  6. Mode-based clustering
  7. D2-clustering
  8. Markov random field, 2-D (Spatial) Hidden Markov Model
Recommended reading for survey (search papers at google scholar ):
  1. L. Breiman, "Random forests", Machine Learning , 45(1):5-32,2001.

  2. V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 2000.
  3. C. J. C. Burges, "A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, 2(2):121-167, 1998.
  4. N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines, 2000.

  5. J. B. Tenenbaum, V. de Silva, J. C. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, 290(5500):2319-2323, 2000.
  6. S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science , 290(5500):2323-2326, December 2000.
  7. H. Zha and Z. Zhang, "Continuum ISOMAP for manifold learning," Computational Statistics and Data Analysis , 52:184--200, 2007.

  8. L. Devroye and L. Gyorfi, Nonparametric density estimation, Wiley Series in Probability and Mathematical Statistics, 1985.
  9. A. J. Izenman, "Recent developments in nonparametric density estimation," Journal of the American Statistical Association, 86(413):205-224, 1991.

  10. J. Shi and J. Malik, "Normalized cuts and image segmentation," IEEE Trans. PAMI , 22(8):888-905, 2000.
  11. H. Zha, X. He, C. Ding, H. Simon, M. Gu, "Bipartite graph partitioning and data clustering," Proc. 10th Int. Conf. Information and Knowledge Management , 25-32, 2001.
  12. I.S. Dhillon, "Co-clustering documents and words using bipartite spectral graph partitioning," Proc. 7th ACM SIGKDD , 269-274, 2001.
  13. A. Y. Ng, M. I. Jordan, Y. Weiss, "On spectral clustering: analysis and an algorithm," Advances in Neural Information Processing Systems , 849-856, 2002.

  14. Y. Leung, J.-S. Zhang, and Z.-B. Xu, "Clustering by scale-space filtering," IEEE Trans. PAMI, 22(12):1396-1410, 2000.
  15. J. Li, S. Ray, B. G. Lindsay, "A nonparametric statistical approach to clustering via mode identification," Journal of Machine Learning Research, 8(8):1687-1723, 2007.
  16. M. C. Minnotte and D. W. Scott, "The mode tree: A tool for visualization of nonparametric density features," Journal of Computational and Graphical Statistics, 2(1):51-68, 1993.

  17. J. Li, J. Z. Wang, "Real-time computerized annotation of pictures," IEEE Trans. PAMI , 30(6):985-1002, 2008.

  18. R. Chellappa and A. Jain, Markov Random Fields: Theory and Application , 1993.
  19. B. S. Manjunath, R. Chellappa, "Unsupervised texture segmentation using Markov random field models," IEEE Trans. PAMI, 13(5):478-482, 1991.
  20. J. Li, A. Najmi, R. M. Gray, "Image classification by a two dimensional hidden Markov model," IEEE Trans. on Signal Processing , 48(2):517-33, 2000.
  21. J. Li, R. M. Gray, R. A. Olshen, "Multiresolution image classification by hierarchical modeling with two dimensional hidden Markov models," IEEE Transactions on Information Theory , 46(5):1826-41, 2000.


Projects and Survey
  1. Project 1. Due on Oct. 7, 2010.
  2. Project 2. Due on Oct. 28, 2010.
  3. Project 3. Due on Nov. 30, 2010
  4. Project 4. Due on Dec. 7, 2010
  5. Survey Due on Dec 14, 2010



Data



Jia Li
Last modified: Tue August 4 11:04:21 EDT 2010