Nonparametric Bayesian Biclustering
- Ted Meeds ,
- Sam Roweis
UTML TR 2007–001 |
We present a probabilistic block-constant biclustering model that simultaneously clusters
rows and columns of a data matrix. All entries with the same row cluster and column cluster
form a bicluster. Each cluster is part of a mixture having a nonparametric Bayesian prior. The
number of biclusters is therefore treated as a nuisance parameter and is implicitly integrated
over during simulation. Missing entries are completely integrated out of the model, allowing
us to completely bipass the common requirement for biclustering algorithms that missing
values be filled before analysis, but also makes it robust to high rates of missing values. By
using a Gaussian model for the density of entries in bliclusters, an efficient sampling algorithm
is produced because bicluster parameters are analytically integrated out. We present
several inference procedures for sampling cluster indicators, including Gibbs and split-merge
moves. We show that our method is competitive, if not superior, to existing imputation methods,
especially for high missing rates, despite imputing constant values for entire blocks of
data. We present imputation experiments and exploratory biclustering results.