*In this project, I want to do come up with a deflation method for NMF, where we learn the NMF basis vectors one by one in the order of importance. See my paper for more details:*

Minje Kim and Paris Smaragdis (2014), "Efficient Model Selection for Speech Enhancement Using a Deflation Method for Nonnegative Matrix Factorization," in Proceedings of the IEEE Global Conference on Signal and Information Processing (Global SIP), Atlanta, GA, December 3-5, 2014.

Minje Kim and Paris Smaragdis (2014), "Efficient Model Selection for Speech Enhancement Using a Deflation Method for Nonnegative Matrix Factorization," in Proceedings of the IEEE Global Conference on Signal and Information Processing (Global SIP), Atlanta, GA, December 3-5, 2014.

Nonnegative Matrix Factorization (NMF) is a useful tool in source separation / speech enhancement applications, particularly if we have one and the only one recording of all sources mixed up. One of the big challenges in NMF-based source separation systems is though, that we have to start from a fixed model complexity, or the number of basis vectors. This limitation can cause some problems from time to time when it comes to the sources without a priori information.

For example, say that we analyze a music signal of three notes,

*W*and

*H*, and during the procedure, it doesn't care if the approximation exceeds the input values. In the above example with only one basis, we see that the learned basis vector is actually an average of the spectrogram along the time axis, and the reconstruction matrix generates some spurious harmonics peaks here and there, which make the reconstruction bigger than the input at some TF bins.

What I'm trying to say is that the NMF results from the rank-3 assumption (I mean three basis vectors to learn) are not the sum of three different rank-1 NMF runs, each of which is with only one basis vector. That's why we need a good guess about the number of components in the first place. Otherwise, we cannot reuse the NMF results from wrong guesses. What a waste of time.

In this project, I want to come up with a deflation method for NMF, where we learn the NMF basis vectors one by one in the order of importance. For example in the 3-notes example, we first learn a basis for the third and the longest note,

We extend this concept to the semi-supervised speech enhancement case where we don't know the characteristics about the noise. To learn the noise bases, a common technique is to learn them from the test signals, with a magically guessed number of components (that's actually the only data we have about the noise). With the proposed deflation method however, we can start from a small number of noise bases, and if we're not satisfied, we can keep adding more bases until the system models the test signal with a small enough error. This is of course a more complicated procedure than an oracle NMF model with the best chosen number of components, but is less complicated than the worst case scenario where we have to search for the best number of components through different runs of the standard NMF.

The deflation method somehow corresponds to the eigendecomposition-based techniques, such as PCA, where we naturally know the order of importance using eigenvalues. It is a greedy method that extracts the most important component first, while leaving a NONNEGATIVE residual, so that the system can pursue another component if it wishes. During the optimization, therefore, we assume an additional term to capture the nonnegative residual part along with the ordinary factor matrices we learn, something like

*D(X|WH+R)*, where

*D*is any error function we use.

The explicit use of the nonnegative residual term

*R*prevents the approximation done by

*WH*from exceeding

*X*. This is the concept called

*under-approximation*. The deflation method for NMF has an interesting connection to the

*Hierarchical Dirichlet Process*, where a topic model is learned with potentially infinite number of components, also with the order of importance. However, the deflation method for NMF has its own merit, such as the freedom of choosing any beta-divergence, on top of the

*beta=1*case that corresponds to the probabilistic topic models. Also, the deflation method is more flexible in choosing the model complexity by allowing the system to stop adding components at any point, while sacrificing the approximation performance due to its greedy nature.