Paper
Darius Petermann and Minje Kim, “Hyperbolic Distance-Based Speech Separation,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Seoul, Korea, Apr. 14-19, 2024 [pdf].
Audio Example #1
In this example, there are three child-level sources mixed up. One of them is a near source, distanced away from the microphone (0.4 meters), while the other two are farther away, 2.0 and 1.7 meters away, respectively. The latter two form the “far parent” source group as a sum, while the former is the only child of the “near parent” group.
- Near Parent Group
- Child Source #1: 0.4 meter
- Far Parent Group
- Child Source #1: 2.0 meter
- Child Source #2: 1.7 meter
Audio Example #2
- Near Parent Group
- Child Source #1: 0.4 meter
- Child Source #2: 0.2 meter
- Child Source #3: 0.6 meter
- Far Parent Group
- Child Source #1: 2.1 meter
In this example, we have a child source in the “near parent” group (Child Source #3, 0.6 meters away). Since it is too close to the threshold, 0.8 meters, which defines the near and far groups in our setup, it is basically an “uncertain” case.