OOMPA logo

OOMPA Overview



The Thresher R-Forge project is one component of the broader OOMPA Project produced by the Coombes Lab over the past decade. it consists of a set of packages to help with more advanced class discovery, clustering, and visualization.


Support for all packages that are part of the OOMPA family uses a common set of discussion forums and bug trackers:

List of Packages

The PCDiimension package, which was described in a paper by M Wang and colleagures, implements and automates a graphical Bayesian method to determine the number of principal components that was introduced by Auer and Gervini in 2008. Wang's paper also introduces the idea of "biological components", which shoud be thought of as an intereptetable (if redundant) replacement for principal components.
The Thresher package uses the PCDimension package as a starting point to estimate the number of clusters present in a data set. The key idea is that the number N of clusters should lie between N and 2N+1 where N is the number of PC components. More details can be found in another paper by M Wang and colleagues. The paper also points out that Thresher can remove outliers at the same time as it determines clusters.
The Mercator package was introduced in a paper by Abrams and colleagues. The package has multiple goals. First, it impements a variety of binary distance metrics, Second, it provides consistent interfaces to a variety of clustering and visualization methods, with an emphasis on being able to color samples consistently across different methods.
In Zach Abrams' PhD thesis, he developed an algorithm called Cytogenetics Pattern Sleuth (CytoGPS) that can parse karyotypes written in a standard text form known as the International System for Human Cytogenetic Nomenclature and convert them into binary vectors accessible to modern machine learning methods. That algorithm is available online at http://cytogps.org/. The RCytoGPS package (see the pair of papers by Abrams and colleagues in BMC Bioinfomratics and Bioinformatics) provides tools to import the output of the web site into R so it can be analyzed and visualized.
The SillyPutty package introduces a novel clustering algorithm, based on optimizing silhouette widths. A manuscript describing and evaluating the algorithm is in preparation; the package will be submitted to CRAN when the manuscript is completed.

R-Forge Logo