Master thesis: A Bayesian nonparametric model for density and cluster estimation: the epsilon-NGG mixture model.

Type: Laurea liv. II (specialistica)

Abstract: In this work we define a new class of random probability measures, approximating the well-known normalized generalized gamma (NGG) process. Our new process is defined from the representation of NGG processes as discrete measures where the weights are obtained by normalization of the jumps of a Poisson process, and the support consists of independent and identically distributed (iid) points, however considering only jumps larger than a threshold ε. Therefore, the number of jumps of this new process, called ε-NGG process, is a.s. finite. A prior distribution for ε can be elicited. We will assume the ε-NGG process as the mixing measure in a mixture model for density and cluster estimation. Moreover, an efficient Gibbs sampler scheme to simulate from the posterior is provided. The model is then applied to two datasets, the well-known univariate Galaxy dataset and the multivariate Yeast cell cycle dataset, consisting of gene expression profiles measured at 9 different times. A deep robustness analysis with respect to the prior is performed for both models, in order to evaluate the goodness-of-fit of the model in a density estimation context and investigate the role of the parameters (which can also be considered as random variables) in the posterior estimates. In the multivariate case, we will also provide posterior cluster estimates, obtained through a loss-function minimization approach.

Author: Ilaria Bianchini

Advisors: A. Guglielmi,  R. Argiento

University: Politecnico di Milano

Defence Date

PDF (Link alla tesi possibilmente)