The main idea is to represent the input with the sparse form, i.e. pairs of indices and values, and to reformulate the original NMF problem for the vectorized input. To expedite the learning, we involve the concept of the non-parametric density estimation, so that each data point is affected by the density only from the closest observations. If we apply this to some difficult music transcription tasks, such as a music piece where bass guitar and drum playing at the same time. The case is particularly difficult since we need very high resolutions in both time and frequency axes, but the usual short time Fourier transform can do that only along one of the directions. We can do the nice decomposition on this non-matrix form of data using the proposed method.
See our paper, "Non-Negative Matrix Factorization for Irregularly-Spaced Transforms (WASPAA 2013)," for more detail.