Scalable and Efficient Speech Enhancement Using Modified Cold Diffusion

Paper

Submitted to ICASSP 2024 (under review)
Title: Scalable and Efficient Speech Enhancement Using Modified Cold Diffusion: A Residual Learning Approach
Authors: Minje Kim and Trausti Kristjansson

Sound Examples

In the following examples, you will see that the first milestone goals \bm{x}_{T-1}, i.e., when T-1=4, are only slightly different from the input noisy speech signal, where the noise source is attenuated by a small amount. This leads the model to stay in the smooth denoising trajectory.


Sound Examples #1

Ground Truth
Noisy Speech
Milestone Target \bm{x}_{T-1}
ResSESE Milestone Result (\tau=1, T=5)
ResSESE Final Result (\tau=10, T=10)
ResSESE Final Result (\tau=5, T=5)
SESE Final Result (\tau=10, T=10)

Sound Examples #2

Ground Truth
Noisy Speech
Milestone Target \bm{x}_{T-1}
ResSESE Milestone Result (\tau=1, T=5)
ResSESE Final Result (\tau=10, T=10)
ResSESE Final Result (\tau=5, T=5)
SESE Final Result (\tau=10, T=10)

Sound Examples #3

Ground Truth
Noisy Speech
Milestone Target x_{T-1}
ResSESE Milestone Result (\tau=1, T=5)
ResSESE Final Result (\tau=10, T=10)
ResSESE Final Result (\tau=5, T=5)
SESE Final Result (\tau=10, T=10)

Sound Examples #4

Ground Truth
Noisy Speech
Milestone Target x_{T-1}
ResSESE Milestone Result (\tau=1, T=5)
ResSESE Final Result (\tau=10, T=10)
ResSESE Final Result (\tau=5, T=5)
SESE Final Result (\tau=10, T=10)