project is one
component of the broader OOMPA Project produced by the
over the past decade. it consists of a set of packages to help with
more advanced class discovery, clustering, and visualization.
Support for all packages that are part of the OOMPA family uses a
common set of discussion forums and bug trackers:
List of Packages
- The PCDiimension package, which was described in a paper by
M Wang and
colleagures, implements and automates a graphical Bayesian
method to determine the number of principal components that was introduced
and Gervini in 2008. Wang's paper also introduces the idea of
"biological components", which shoud be thought of as an
intereptetable (if redundant) replacement for principal components.
- The Thresher package uses the PCDimension package as a starting
point to estimate the number of clusters present in a data set. The
key idea is that the number N of clusters should lie between N
and 2N+1 where N is the number of PC components. More
details can be found in another paper
by M Wang and
colleagues. The paper also points out that Thresher can remove
outliers at the same time as it determines clusters.
- The Mercator package was introduced in a paper
and colleagues. The package has multiple goals. First, it
impements a variety of binary distance metrics, Second, it provides
consistent interfaces to a variety of clustering and visualization
methods, with an emphasis on being able to color samples
consistently across different methods.
- In Zach Abrams' PhD thesis, he developed an algorithm called
Cytogenetics Pattern Sleuth (CytoGPS) that can parse karyotypes
written in a standard text form known as
System for Human Cytogenetic Nomenclature and convert them into
binary vectors accessible to modern machine learning methods. That
algorithm is available online
at http://cytogps.org/. The
RCytoGPS package (see the pair of papers by Abrams and colleagues
provides tools to import the output of the web site into R so it
can be analyzed and visualized.
- The SillyPutty package introduces a novel clustering
algorithm, based on optimizing silhouette widths. A manuscript
describing and evaluating the algorithm is in preparation; the
package will be submitted to CRAN when the manuscript is completed.