About Me
I’m a senior research scientist and lead the Voice Synthesis team at Naver Cloud, Korea (from Jan 2023; Naver Corporation from Mar 2017 to Dec 2022). I’m also an adjunct professor in Artificial Intelligence Institute at Seoul National University, Seoul, Korea (from Aug 2022).
I received my Ph.D. degree in department of Electrical and Electronics at Yonsei University, Seoul, Korea. During my Ph.D., I served my internships at Microsoft Research Asia, Beijing, China and Qualcomm Technologies Inc., San Diego, CA.
My research interests include speech synthesis and its real-world applications. Specifically, I develop a high-quality TTS api for cloud services (Clova Voice Pro, Clova Dubbing), an automatic TTS modeling with smartphone recordings (Voice Maker), and a hybrid TTS system combining deep learning and unit-selection TTS models (Clova AI speaker, Naver Maps navigation, Naver News anchor).
If you are interested in me, feel free to contact me.
Download my CV
Recent Publications
Enhancing multilingual TTS with voice conversion based data augmentation and posterior embedding [paper][demo]
Hyun-Wook Yoon, Jin-Seob Kim, Ryuichi Yamamoto, Ryo Terashima, Chan-Ho Song, Jae-Min Kim, Eunwoo Song
Proc. ICASSP, 2024, pp. 12186-12190.Unified Speech-Text Pretraining for Spoken Dialog Modeling [paper][demo]
Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo
arXiv preprint arXiv: 2402.05706, 2024.Pruning self-attention for zero-shot multi-speaker text-to-speech [paper][demo]
Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang
Proc. INTERSPEECH, 2023, pp. 4299-4303.Period VITS: Variational inference with explicit pitch modeling for end-to-end emotional speech synthesis [paper][demo]
Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana
Proc. ICASSP, 2023, pp. 4299-4303.HierSpeech: Bridging the gap between text and speech by hierarchical variational inference using self-supervised representations for speech synthesis [paper][demo]
Sang-Hoon Lee, Seung-Bin Kim, Ji-Hyun Lee, Eunwoo Song, Min-Jae Hwang, Seong-Whan Lee
Proc. NeurIPS, 2022, pp. 16624-16636.TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder [paper][demo]
Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim
Proc. INTERSPEECH, 2022, pp. 1941-1945.