About Me

I’m a senior research scientist and lead the Voice Synthesis team at Naver Cloud, Korea (from Jan 2023; Naver Corporation from Mar 2017 to Dec 2022). I’m also an adjunct professor in Artificial Intelligence Institute at Seoul National University, Seoul, Korea (from Aug 2022).

I received my Ph.D. degree in department of Electrical and Electronics at Yonsei University, Seoul, Korea. During my Ph.D., I served my internships at Microsoft Research Asia, Beijing, China and Qualcomm Technologies Inc., San Diego, CA.

My research interests include speech synthesis and its real-world applications. Specifically, I develop a high-quality TTS api for cloud services (Clova Voice Pro, Clova Dubbing), an automatic TTS modeling with smartphone recordings (Voice Maker), and a hybrid TTS system combining deep learning and unit-selection TTS models (Clova AI speaker, Naver Maps navigation, Naver News anchor).

If you are interested in me, feel free to contact me.

Download my CV

Recent Publications

RapFlow-TTS: Rapid and high-fidelity text-to-speech with improved consistency flow matching [demo]
Hyun Joon Park, Jeongmin Liu, Jin Sob Kim, Jeong Yeol Yang, Sung Won Han, Eunwoo Song
Proc. Interspeech, 2025 (in press).
Paralinguistics-aware speech-empowered large language models for natural conversation [paper][demo]
Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon, Kang Min Yoo
Proc. NeurIPS, 2024, pp. 131072-131103.
Enhancing multilingual TTS with voice conversion based data augmentation and posterior embedding [paper][demo]
Hyun-Wook Yoon, Jin-Seob Kim, Ryuichi Yamamoto, Ryo Terashima, Chan-Ho Song, Jae-Min Kim, Eunwoo Song
Proc. ICASSP, 2024, pp. 12186-12190.
Pruning self-attention for zero-shot multi-speaker text-to-speech [paper][demo]
Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang
Proc. INTERSPEECH, 2023, pp. 4299-4303.
Period VITS: Variational inference with explicit pitch modeling for end-to-end emotional speech synthesis [paper][demo]
Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana
Proc. ICASSP, 2023, pp. 4299-4303.
HierSpeech: Bridging the gap between text and speech by hierarchical variational inference using self-supervised representations for speech synthesis [paper][demo]
Sang-Hoon Lee, Seung-Bin Kim, Ji-Hyun Lee, Eunwoo Song, Min-Jae Hwang, Seong-Whan Lee
Proc. NeurIPS, 2022, pp. 16624-16636.
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder [paper][demo]
Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim
Proc. INTERSPEECH, 2022, pp. 1941-1945.
[See more]

Recent Talks

AI Human: Large-scale text-to-speech applications [Slides]
SNU, Jan 2025
Speech synthesis and applications [Slides]
SNU, Dec 2023
Parallel waveform synthesis [Slides]
Samsung Research, Sep 2022
Data-selective TTS augmentation [Slides]
Naver Engineering Day, Jul 2022
[See more]

Eunwoo Song

About Me

Recent Publications

Recent Talks