Defence of dissertation in the field of Language Technology, M.Sc. (Tech.) Manu Airaksinen

2018-06-08 12:00:00 2018-06-08 23:59:59 Europe/Helsinki Defence of dissertation in the field of Language Technology, M.Sc. (Tech.) Manu Airaksinen The title of thesis is “Methods for the application of glottal inverse filtering to statistical parametric speech synthesis” http://old.spa.aalto.fi/en/midcom-permalink-1e8624558c5d606624511e89b510548de1485358535 Maarintie 8, 02150, Espoo

The title of thesis is “Methods for the application of glottal inverse filtering to statistical parametric speech synthesis”

08.06.2018 / 12:00

Knowledge of the physiological operation of human speech production can be applied to synthesize high-quality speech with a computer.  A powerful model for the human speech production is the source-filter model, which states that speech is generated by passing an excitation signal (airflow coming from the lungs) through the vocal tract (mouth, tongue, lips, etc.), whose configuration acts as a filter that spectrally shapes the excitation. During voiced speech, the airflow com-ing from the lungs makes the vocal folds oscillate in a quasi-periodic manner, which adds the element of pitch to the produced voice.  The gap between the vocal folds, through which this excitation airflow passes, is called the glottis, which gives rise to the term “glottal excitation”. The glottal excitation is thus an air volume velocity waveform that flows through the glottis. In voiced phonation, the glottal excitation is a quasi-periodic signal that carries a richness of information about the intonation, i.e., about how things are said as opposed to what was said.

Glottal inverse filtering (GIF) is a non-invasive computational framework that tries to estimate the underlying glottal excitation from a recorded speech signal. One of the main application areas of GIF has been statistical parametric speech synthe-sis, where speech signals are ultimately generated with the same model that feeds GIF-derived excitation waveforms into a modeled vocal tract filter. The goal of this dissertation has been 1) to develop new, powerful, and scalable GIF meth-ods that are suitable for use in today’s increasingly data-driven machine learning frameworks, and 2) apply the new GIF methods within the context of statistical parametric speech synthesis for increased speech quality and naturalness.

Opponent: Professor Jon Gudnason, Reykjavik University, Iceland

Supervisor:  Professor Paavo Alku, Aalto University School of Electrical Engineering, Department of Signal Processing and Acoustics.

Thesis website
Notice of dissertation defence (pdf.)
Contact information: Manu Airaksinen, +358 50 5311126, manu.airaksinen@aalto.fi