Hyperbolic Distance-Based Speech Separation

Paper

Darius Petermann and Minje Kim, “Hyperbolic Distance-Based Speech Separation,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Seoul, Korea, Apr. 14-19, 2024 [pdf].

Audio Example #1

In this example, there are three child-level sources mixed up. One of them is a near source, distanced away from the microphone (0.4 meters), while the other two are farther away, 2.0 and 1.7 meters away, respectively. The latter two form the “far parent” source group as a sum, while the former is the only child of the “near parent” group.

Near Parent Group
- Child Source #1: 0.4 meter
Far Parent Group
- Child Source #1: 2.0 meter
- Child Source #2: 1.7 meter

Mixture Audio

Far Parent Sources (two sources)

Estimated Far Parent Sources

Near Parent Source (one source)

Estimated Near Parent Source

Far Child Source #1 (2.0 meter)

Estimated Far Child Source #1

Far Child Source #2 (1.7 meter)

Estimated Far Child Source #2

Audio Example #2

Near Parent Group
- Child Source #1: 0.4 meter
- Child Source #2: 0.2 meter
- Child Source #3: 0.6 meter
Far Parent Group
- Child Source #1: 2.1 meter

In this example, we have a child source in the “near parent” group (Child Source #3, 0.6 meters away). Since it is too close to the threshold, 0.8 meters, which defines the near and far groups in our setup, it is basically an “uncertain” case.