Usual topic models or NMF-like factorizations result in a wrapper, i.e. a convex hull or cone, that compactly contains the input spectra. When it comes to separation, we assume pre-defined several wrappers like that are available, one for each training source, and hope that they do not overlap each other, while in practice they often do so.



The wrapper is a lossy representation of the full training spectra that works like a dictionary of templates. What it sacrifices is the minute details of the data manifold, which is sometimes critical for recovering quality audio signals. We are working on some probabilistic topic models with sparsity constraints so as to learn the hyper topics each of which represents only its local neighbors. Those hyper topics play a role in getting manifold preserving quantization of training signals instead of wrappers



and in leading the recovered source spectra to lying on the original data manifold.



See our paper for more detail: "Manifold Preserving Hierarchical Topic Models for Quantization and Approximation (ICML 2013)"