Software Packages for Clustering and Classification

Licensing Information

  We use the GNU General Public License (GPL), version 3.

Environment

  The software packages were developed under Unix/Linux OS using C, Matlab, or R.

Suggestions/Bug Report

  Please contact Jia Li for suggestions and report of bugs (Email: jiali at psu dot edu).

Download

  • C codes:
    1. Image segmentation algorithm: MS-A3C (Multi-Stage Agglomerative Connectivity Constraind Clustering) ( C codes with usage document )
      Related publication:
      • Jia Li, "Agglomerative connectivity constrained clustering for image segmentation," Statistical Analysis and Data Mining, 2011. (download)
    2. Modal clustering and linkage clustering: HMAC ( C, Matlab, R source codes with usage document )
      Related publication:
      • J. Li, S. Ray, B. G. Lindsay, "A nonparametric statistical approach to clustering via mode identification," Journal of Machine Learning Research , 8(8):1687-1723, 2007. (download)

    3. Two-way Poisson Mixture Model for classification of count data, e.g., word count data for document classification. ( C codes with usage document )
      Related publication:
      • Jia Li, Hongyuan Zha, "Two-way Poisson mixture models for simultaneous document classification and word clustering," Computational Statistics and Data Analysis, 50(1):163-180, 2006. (download)

  • Matlab codes:
    1. Clustering by multi-layer mixture model ( download the package ). Special thanks go to Francesca Martella from Leids University Medical Center, Netherlands, for documenting the codes and improving the organization.
      Related publication:
      • Jia Li, "Clustering based on a multi-layer mixture model," Journal of Computational and Graphical Statistics , 14(3):547-568, 2005. (download)
    2. Gaussian mixture model-based clustering, estimation by classification EM (CEM)
    3. Demo for clustering using the following methods, a subroutine for plotting results (needed by the demo program).
    4. K-means clustering
    5. Gaussian mixture model-based clustering, estimation by EM , EM initialization.

  • R codes
    1. Variable selection for clustering by Ridgeline-Based Separability ( R codes with usage document )
      Related publication:
      • Hyangmin Lee, Jia Li, "Variable selection for clustering by separability based on ridgelines," Journal of Computational and Graphical Statistics, 21(2):315-337, 2012.