Dissertation in the field of Speech and Language Technology, Heikki Kallasjoki

2016-04-15 12:00:56 2016-04-15 16:00:17 Europe/Helsinki Dissertation in the field of Speech and Language Technology, Heikki Kallasjoki "Feature Enhancement and Uncertainty Estimation for Recognition of Noisy and Reverberant Speech." http://old.spa.aalto.fi/en/midcom-permalink-1e5e46cde1efa2ae46c11e5899127f3cbde53c353c3 Otakaari 5A, 02150, Espoo

"Feature Enhancement and Uncertainty Estimation for Recognition of Noisy and Reverberant Speech."

15.04.2016 / 12:00 - 16:00
lecture hall S1, Otakaari 5A, 02150, Espoo, FI

While automatic speech recognition systems are already widely used for practical applications, they still fall far short of human-level performance when it comes to coping with real, noisy environments. Yet as the use of mobile computing devices continues to grow, the systems need to work for a wide variety of public scenes, such as streets filled with traffic and busy restaurants. In living room entertainment systems, an additional challenge is that the microphone is positioned far from the speaker, which results in a signal with significant amounts of reflected sounds.

This thesis focuses on ways to manipulate the recorded speech signals to minimize the effect of the environment. A low-level approach to the problem is provided by improving the methods used to estimate the signal spectrum. Speech and noise in the signal can also be separated by methods based on the missing data principle, non-negative matrix factorization or their combination. To remove reflected sounds, the thesis proposes an extension to the non-negative matrix factorization scheme, where the process of reverberation is modeled as part of the factorization.

Additionally, the thesis investigates tools for estimating the reliability of the enhanced signals. Such methods can produce an estimate of which time and frequency regions of the resulting signal are likely to be corrupted by noise. This improves the accuracy of speech recognition, as the system can focus more on the regions likely to be reliable.

Opponent: Professor Dorothea Kolossa, Ruhr-Universität Bochum, Germany

Supervisor: Professor Mikko Kurimo, Aalto University School of Electrical Engineering, Department of Signal Processing and Acoustics

Dissertation website  

Contact information:
Heikki Kallasjoki
heikki.kallasjoki@iki.fi