Dissertation in the field of speech and language technology, André Mansikkaniemi

2017-02-17 12:00:00 2017-02-17 23:59:12 Europe/Helsinki Dissertation in the field of speech and language technology, André Mansikkaniemi The title of the thesis is: Continuous Unsupervised Topic Adaptation for Morph-based Speech Recognition. http://old.spa.aalto.fi/en/midcom-permalink-1e6e798df9da6bee79811e684a0bfde5e4037793779 Otakaari 5, 02150, Espoo

The title of the thesis is: Continuous Unsupervised Topic Adaptation for Morph-based Speech Recognition.

17.02.2017 / 12:00
Lecture hall S1, Otakaari 5, 02150, Espoo, FI

Automatic speech recognition (ASR) systems convert speech to text. The basic building blocks of a modern ASR system are the statistical acoustic and language models and the pronunciation dictionary (lexicon). The statistical models are trained on vast amounts of speech and text data from pre-existing collections. Depending on the language, the lexicon is either generated automatically or put together manually by experts. A challenge with time for ASR systems is how to recognize new words and phrases. In this thesis methods have been developed to enable automatic adaptation of an ASR system for Finnish. A method for language model adaptation has been studied where new text data is collected from the Web. The language model is adapted to a specific recording by automatically selecting Web articles which are topically closest to the recognized text. A new language model is acquired by adapting the baseline model with the selected articles. Results show that recognition accuracy is improved for Finnish broadcast news when the new adapted language model is used. Methods for adapting the lexicon have also been developed, with a special focus on foreign names and acronyms. Methods are used to automatically identify foreign names and acronyms in the Web texts, and new pronunciation rules are generated and added to the lexicon. Used together with the adapted language model, lexicon adaptation also improves recognition accuracy. These methods can be used to adapt ASR systems to new speech data and to enable continuous update cycles whenever new text data is acquired.

Opponent: Professor Torbjørn Svendsen, Norges Teknisk Naturvitenskapelige Universitet (NTNU)

Supervisor: Professor Mikko Kurimo, Aalto University School of Electrical Engineering, Department of Signal Processing and Acoustics

Contact information:

André Mansikkaniemi,
tel. +358407327562
andre.mansikkaniemi@aalto.fi
SPA, Aalto University spa.aalto.fi