About Me

I’m an Assistant Professor in the School of Electrical and Electronics Engineering at Yonsei University, Seoul, Korea (starting Mar 2026).

Prior to joining Yonsei, I was a Senior Research Scientist and led the Voice Synthesis team at Naver Cloud (Jan 2023 – Feb 2026) and Naver Corporation (Mar 2017 – Dec 2022). I also serve as an adjunct professor at the Artificial Intelligence Institute, Seoul National University (since Aug 2022).

I received my Ph.D. degree from the Department of Electrical and Electronic Engineering at Yonsei University. During my doctoral studies, I completed internships at Microsoft Research Asia (Beijing) and Qualcomm Technologies Inc. (San Diego).

My research interests include speech signal processing and artificial intelligence. Currently, my research focuses on developing generative speech AI models leveraging large language models.

If you are interested in me, feel free to contact me.

Download my CV


Recent Publications

  • RapFlow-TTS: Rapid and high-fidelity text-to-speech with improved consistency flow matching [demo]
    Hyun Joon Park, Jeongmin Liu, Jin Sob Kim, Jeong Yeol Yang, Sung Won Han, Eunwoo Song
    Proc. Interspeech, 2025, pp. 2440-2444.

  • Paralinguistics-aware speech-empowered large language models for natural conversation [paper][demo]
    Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Soyoon Kim, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Jung-Woo Ha, Sungroh Yoon, Kang Min Yoo
    Proc. NeurIPS, 2024, pp. 131072-131103.

  • Enhancing multilingual TTS with voice conversion based data augmentation and posterior embedding [paper][demo]
    Hyun-Wook Yoon, Jin-Seob Kim, Ryuichi Yamamoto, Ryo Terashima, Chan-Ho Song, Jae-Min Kim, Eunwoo Song
    Proc. ICASSP, 2024, pp. 12186-12190.

  • Pruning self-attention for zero-shot multi-speaker text-to-speech [paper][demo]
    Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang
    Proc. INTERSPEECH, 2023, pp. 4299-4303.

  • Period VITS: Variational inference with explicit pitch modeling for end-to-end emotional speech synthesis [paper][demo]
    Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana
    Proc. ICASSP, 2023, pp. 4299-4303.

  • HierSpeech: Bridging the gap between text and speech by hierarchical variational inference using self-supervised representations for speech synthesis [paper][demo]
    Sang-Hoon Lee, Seung-Bin Kim, Ji-Hyun Lee, Eunwoo Song, Min-Jae Hwang, Seong-Whan Lee
    Proc. NeurIPS, 2022, pp. 16624-16636.

  • TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder [paper][demo]
    Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim
    Proc. INTERSPEECH, 2022, pp. 1941-1945.

    [See more]


Recent Talks

  • Zero-shot voice cloning [Slides]
    SNU, Dec 2025

  • AI Human: Large-scale text-to-speech applications [Slides]
    SNU, Jan 2025

  • Speech synthesis and applications [Slides]
    SNU, Dec 2023

  • Parallel waveform synthesis [Slides]
    Samsung Research, Sep 2022

    [See more]