Hyperbolic Distance-Based Speech Separation

Paper

Darius Petermann and Minje Kim, “Hyperbolic Distance-Based Speech Separation,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Seoul, Korea, Apr. 14-19, 2024 [pdf].

Audio Example #1

In this example, there are three child-level sources mixed up. One of them is a near source, distanced away from the microphone (0.4 meters), while the other two are farther away, 2.0 and 1.7 meters away, respectively. The latter two form the “far parent” source group as a sum, while the former is the only child of the “near parent” group.

  • Near Parent Group
    • Child Source #1: 0.4 meter
  • Far Parent Group
    • Child Source #1: 2.0 meter
    • Child Source #2: 1.7 meter
Mixture Audio
Far Parent Sources (two sources)
Estimated Far Parent Sources
Near Parent Source (one source)
Estimated Near Parent Source
Far Child Source #1 (2.0 meter)
Estimated Far Child Source #1
Far Child Source #2 (1.7 meter)
Estimated Far Child Source #2

Audio Example #2

  • Near Parent Group
    • Child Source #1: 0.4 meter
    • Child Source #2: 0.2 meter
    • Child Source #3: 0.6 meter
  • Far Parent Group
    • Child Source #1: 2.1 meter

In this example, we have a child source in the “near parent” group (Child Source #3, 0.6 meters away). Since it is too close to the threshold, 0.8 meters, which defines the near and far groups in our setup, it is basically an “uncertain” case.

Mixture Audio
Far Parent Source (one source)
Estimated Far Source
Near Parent Sources (three sources)
Estimated Near Parent Sources
Near Child Source #1 (0.4 meter)
Estimated Near Child Source #1
Near Child Source #2 (0.2 meter)
Estimated Near Child Source #2
Near Child Source #3 (0.6 meter)
Estimated Near Child Source #3

More audio examples can be downloaded here.