Dissertation in the field of Speech and Language Technology, Heikki Kallasjoki
"Feature Enhancement and Uncertainty Estimation for Recognition of Noisy and Reverberant Speech."
Map © OpenStreetMap. Some rights reserved.
While automatic speech recognition systems are already widely used for practical applications, they still fall far short of human-level performance when it comes to coping with real, noisy environments. Yet as the use of mobile computing devices continues to grow, the systems need to work for a wide variety of public scenes, such as streets filled with traffic and busy restaurants. In living room entertainment systems, an additional challenge is that the microphone is positioned far from the speaker, which results in a signal with significant amounts of reflected sounds.
This thesis focuses on ways to manipulate the recorded speech signals to minimize the effect of the environment. A low-level approach to the problem is provided by improving the methods used to estimate the signal spectrum. Speech and noise in the signal can also be separated by methods based on the missing data principle, non-negative matrix factorization or their combination. To remove reflected sounds, the thesis proposes an extension to the non-negative matrix factorization scheme, where the process of reverberation is modeled as part of the factorization.
Additionally, the thesis investigates tools for estimating the reliability of the enhanced signals. Such methods can produce an estimate of which time and frequency regions of the resulting signal are likely to be corrupted by noise. This improves the accuracy of speech recognition, as the system can focus more on the regions likely to be reliable.
Opponent: Professor Dorothea Kolossa, Ruhr-Universität Bochum, Germany
Supervisor: Professor Mikko Kurimo, Aalto University School of Electrical Engineering, Department of Signal Processing and Acoustics
Contact information:
Heikki Kallasjoki
heikki.kallasjoki@iki.fi