Minje Kim's Research - Minje Kim's Home

We live in a world full of complex problems. For an AI system to solve them, it involves a complex model that can best approximate the problem. In this era of AI, we finally seem to afford those complex models thanks to the technological advances in computing power, the theoretical advances in machine learning algorithms, and the availability of big data. As a result, AI starts to compete with human intelligence in some problems.

However, we believe that AI is still missing many powerful features that natural intelligence surely possesses. For example, does your brain need ten times more sugar intake when it solves a problem ten times more difficult? How much energy does a mosquito need to sense the target and fly to the destination accordingly? Can you imagine a mosquito-sized drone that can do the same job (with a tiny battery)? We know that the intelligent systems found in nature are not only effective but efficient. One of the main research agendas in SAIGE is building machine learning models that run more efficiently during the test time and in hardware. This kind of system ranges from a deep neural network and probabilistic topic models defined in a bitwise fashion to using psychoacoustics to make a simple model more powerful.

Another important intelligent behavior is a collaboration among individuals. We seek collaboration between devices and sensors like we do in team projects. For example, we have been interested in consolidating many different audio signals recorded by various devices to come up with a commonly dominant source of the audio scene. Since the recordings can contain both the dominant source of interest and its own artifact (e.g. additive noise, reverberation, band-pass filtering, etc), a naïve average of the recordings is not a good solution to this problem. We call this kind of problem collaborative audio enhancement. Another collaboration in nature can happen between different sensors like we recognize someone else’s emotion by looking at her facial expression and listening to her voice. Therefore, building a machine learning model that fuses all the different decisions made from multiple modalities is of our interest, too.

Finally, we believe that one of the keys to success in machine learning applications is to improve each user’s personal experience using personalized models. A personalized model can be a more resource-efficient solution than a general-purpose model, too, because it focuses on a particular sub-problem, for which a smaller model architecture can be good enough. However, training a personalized model requires data from the particular test-time user, which are not always available due to their private nature. Furthermore, such data tend to be unlabeled as it can be collected only during the test time, once after the system is deployed to the user devices. One could rely on the generalization power of a generic model, but such a model can be too computationally/spatially complex for real-time processing in a resource-constrained device. Our machine learning models will require no or few data samples from the test-time users, while they can still achieve the personalization goal. Likewise, our personalized models are designed to minimize the use of personal data to improve the privacy preservation in machine learning as well as the performance imbalance between different social groups.

News

ICASSP 2024
April 20, 2024
Attended ICASSP 2024. Organized the HSCMA workshop, presented five papers, chaired a session on audio coding, and attended many meetings.Read More »
Yamaha
April 11, 2024
Visited Yamaha Headquarters in Hamamatsu. Gave a talk on “Improving Scalability, Efficiency, Personalization, and Interactivity in Audio Processing.”Read More »
Academia Sinica
April 9, 2024
Gave an invited talk at Academia Sinica on “Revamping latent variable analysis for speech and audio processing.”Read More »
Senior Area Editorship for IEEE SPL
March 28, 2024
Minje began serving on the editorial board of the IEEE Signal Processing Letters as a Senior Area Editor.Read More »
Darius Proposed Dissertation
March 25, 2024
Darius Petermann successfully proposed his dissertation on multi-band neural audio coding.Read More »
Interspeech 2024 Tutorial
March 16, 2024
My tutorial proposal was accepted at Interspeech 2024. It will be on neural speech coding and co-organized by Jan Skoglund at Google.Read More »
Haici Yang Proposed Dissertation
February 17, 2024
Haici Yang successfully proposed her dissertation on “Neural Generative Models for Latent Variable Learning in Ultra-Low-Bitrate Audio Codecs.”Read More »
Minje’s Move
January 19, 2024
Minje started his new role as an associate professor at CS@UIUC.Read More »

Earlier News

Minje Kim's Home

Innovation and Education in AI and Audio Processing.

Selected Projects