About Me

I’m a senior research scientist and lead the Voice Synthesis team at Naver Cloud, Korea (from Jan 2023; Naver Corporation from Mar 2017 to Dec 2022). I’m also an adjunct professor in Artificial Intelligence Institute at Seoul National University, Seoul, Korea (from Aug 2022).

I received my Ph.D. degree in department of Electrical and Electronics at Yonsei University, Seoul, Korea. During my Ph.D., I served my internships at Microsoft Research Asia, Beijing, China and Qualcomm Technologies Inc., San Diego, CA.

My research interests include speech synthesis and its real-world applications. Specifically, I develop a high-quality TTS api for cloud services (Clova Voice Pro, Clova Dubbing), an automatic TTS modeling with smartphone recordings (Voice Maker), and a hybrid TTS system combining deep learning and unit-selection TTS models (Clova AI speaker, Naver Maps navigation, Naver News anchor).

If you are interested in me, feel free to contact me.

Download my CV


Recent Publications

  • Enhancing multilingual TTS with voice conversion based data augmentation and posterior embedding [paper][demo]
    Hyun-Wook Yoon, Jin-Seob Kim, Ryuichi Yamamoto, Ryo Terashima, Chan-Ho Song, Jae-Min Kim, Eunwoo Song
    Proc. ICASSP, 2024, pp. 12186-12190.

  • Unified Speech-Text Pretraining for Spoken Dialog Modeling [paper][demo]
    Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo
    arXiv preprint arXiv: 2402.05706, 2024.

  • Pruning self-attention for zero-shot multi-speaker text-to-speech [paper][demo]
    Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang
    Proc. INTERSPEECH, 2023, pp. 4299-4303.

  • Period VITS: Variational inference with explicit pitch modeling for end-to-end emotional speech synthesis [paper][demo]
    Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana
    Proc. ICASSP, 2023, pp. 4299-4303.

  • HierSpeech: Bridging the gap between text and speech by hierarchical variational inference using self-supervised representations for speech synthesis [paper][demo]
    Sang-Hoon Lee, Seung-Bin Kim, Ji-Hyun Lee, Eunwoo Song, Min-Jae Hwang, Seong-Whan Lee
    Proc. NeurIPS, 2022, pp. 16624-16636.

  • TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder [paper][demo]
    Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim
    Proc. INTERSPEECH, 2022, pp. 1941-1945.

    [See more]


Recent Talks

  • Speech synthesis and applications [Slides]
    SNU, Dec 2023

  • Parallel waveform synthesis [Slides]
    Samsung Research, Sep 2022

  • Data-selective TTS augmentation [Slides]
    Naver Engineering Day, Jul 2022

    [See more]