Publications
You can also find my articles on my Google Scholar profile
2024
Enhancing multilingual TTS with voice conversion based data augmentation and posterior embedding [paper][demo]
Hyun-Wook Yoon, Jin-Seob Kim, Ryuichi Yamamoto, Ryo Terashima, Chan-Ho Song, Jae-Min Kim, Eunwoo Song
Proc. ICASSP, 2024, pp. 12186-12190.Unified Speech-Text Pretraining for Spoken Dialog Modeling [paper][demo]
Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo
arXiv preprint arXiv: 2402.05706, 2024.
2023
Pruning self-attention for zero-shot multi-speaker text-to-speech [paper][demo]
Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang
Proc. INTERSPEECH, 2023, pp. 4299-4303.Period VITS: Variational inference with explicit pitch modeling for end-to-end emotional speech synthesis [paper][demo]
Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana
Proc. ICASSP, 2023, pp. 4299-4303.
2022
HierSpeech: Bridging the gap between text and speech by hierarchical variational inference using self-supervised representations for speech synthesis [paper][demo]
Sang-Hoon Lee, Seung-Bin Kim, Ji-Hyun Lee, Eunwoo Song, Min-Jae Hwang, Seong-Whan Lee
Proc. NeurIPS, 2022, pp. 16624-16636.TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder [paper] [demo]
Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim
Proc. INTERSPEECH, 2022, pp. 1941-1945.Language model-based emotion prediction methods for emotional speech synthesis systems [paper] [demo]
Hyun-Wook Yoon, Ohsung Kwon, Hoyeon Lee, Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim, Min-Jae Hwang
Proc. INTERSPEECH, 2022, pp. 4596-4600.Cross-speaker emotion transfer for low-resource text-to-speech using non-parallel voice conversion with pitch-shift data augmentation [paper] [demo]
Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana
Proc. INTERSPEECH, 2022, pp. 3018-3022.Linear prediction-based Parallel WaveGAN speech synthesis [paper]
Min-Jae Hwang, Hyun-Wook Yoon, Chan-Ho Song, Jin-Seob Kim, Jae-Min Kim, Eunwoo Song
Proc. ICEIC, 2022, pp. 1-4.Effective data augmentation methods for neural text-to-speech systems [paper]
Suhyeon Oh, Ohsung Kwon, Min-Jae Hwang, Jae-Min Kim, Eunwoo Song
Proc. ICEIC, 2022, pp. 1-4.
2021
High-fidelity Parallel WaveGAN with multi-band harmonic-plus-noise model [paper] [demo]
Min-Jae Hwang, Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Proc. INTERSPEECH, 2021, pp. 2227-2231.LiteTTS: A decoder-free lightweight text-to-wave synthesis based on generative adversarial networks [paper] [demo]
Huu-Kim Nguyen, Kihyuk Jeong, Seyun Um, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang
Proc. INTERSPEECH, 2021. pp. 3595-3599.Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators [paper] [demo]
Ryuichi Yamamoto, Eunwoo Song, Min-Jae Hwang, Jae-Min Kim
Proc. ICASSP, 2021, pp. 6039-6043.TTS-by-TTS: TTS-driven data augmentation for fast and high-quality speech synthesis [paper] [demo]
Min-Jae Hwang, Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Proc. ICASSP, 2021, pp. 6598-6602.Improved Parallel WaveGAN with perceptually weighted spectrogram loss [paper] [demo]
Eunwoo Song, Ryuichi Yamamoto, Min-Jae Hwang, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim
Proc. SLT, 2021, pp. 470-476.
2020
LP-WaveNet: Linear prediction-based WaveNet speech synthesis [paper] [demo]
Min-Jae Hwang, Frank Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, Hong-Goo Kang
Proc. APSIPA, 2020, pp. 810-814.ExcitGlow: Improving a WaveGlow-based neural vocoder with linear prediction analysis [paper]
Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang
Proc. APSIPA, 2020, pp. 831-836.Neural text-to-speech with a modeling-by-generation excitation vocoder [paper] [demo]
Eunwoo Song, Min-Jae Hwang, Ryuichi Yamamoto, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim
Proc. INTERSPEECH, 2020, pp. 3570-3574.Speaker-adaptive neural vocoders for parametric speech synthesis systems [paper] [demo]
Eunwoo Song, Jinseob Kim, Kyungguen Byun, Hong-Goo Kang
Proc. MMSP, 2020, pp. 1-5.Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram [paper] [demo]
Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Proc. ICASSP, 2020, pp. 6194-6198.Improving LPCNet-based text-to-speech with linear predictions-structured mixture density network [paper] [demo]
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank K. Soong, Hong-Goo Kang
Proc. ICASSP, 2020, pp. 7214-7218.
~2019
Probability density distillation with generative adversarial networks for high-quality parallel waveform generation [paper] [demo]
Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Proc. INTERSPEECH, 2019, pp. 699-703.ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems [paper] [demo]
Eunwoo Song, Kyungguen Byun, Hong-Goo Kang
Proc. EUSIPCO, 2019, pp. 1179-1183.Excitation-by-SampleRNN model for text-to-speech [paper]
Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang
Proc. ITC-CSCC, 2019, pp. 356-359.Acoustic modeling using adversarially trained variational recurrent neural network for speech synthesis [paper]
Joun Yeop Lee, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim, Eunwoo Song
Proc. INTERSPEECH, 2018, pp. 917-921.A unified framework for the generation of glottal signals in deep learning-based parametric speech synthesis systems [paper]
Min-Jae Hwang, Eunwoo Song, J.-S. Kim, Hong-Goo Kang
Proc. INTERSPEECH, 2018, pp. 912-916.Modeling-by-generation-structured noise compensation algorithm for glottal vocoding speech synthesis system [paper]
Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang
Proc. ICASSP, 2018, pp. 5669-5673.Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems [paper]
Eunwoo Song, Frank K. Soong, Hong-Goo Kang
Proc. ASRU, 2017, pp. 671–676.Effective spectral and excitation modeling techniques for LSTM-RNN-based speech synthesis systems [paper]
Eunwoo Song, Frank K. Soong, Hong-Goo Kang
IEEE/ACM Trans. Audio, Speech, and Lang. Process., vol. 25, no. 11, pp. 2152–2161, 2017.Improved time-frequency trajectory excitation vocoder for DNN-based speech synthesis [paper]
Eunwoo Song, Frank K. Soong, Hong-Goo Kang
Proc. INTERSPEECH, 2016, pp. 874–878.Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis [paper]
Eunwoo Song, Hong-Goo Kang
Proc. EUSIPCO, 2016, pp. 1951–1955.Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model [paper]
Eunwoo Song, Hong-Goo Kang
Proc. INTERSPEECH, 2015, pp. 874–878.A constrained two-layer compression technique for ECG waves [paper]
Kyungguen Byun, Eunwoo Song, H. Sim, H. Lim, Hong-Goo Kang
Proc. EMBC, 2015, pp. 6130–6133.Improved time-frequency trajectory excitation modeling for a statistical parametric speech synthesis system [paper]
Eunwoo Song, Young-Sun Joo, Hong-Goo Kang
Proc. ICASSP, 2015, pp. 4949–4953.Fixed-point implementation of MPEG-D unified speech and audio coding decoder [paper]
Eunwoo Song, Hong-Goo Kang, Joonil Lee
Proc. DSP, 2014, pp. 110–113.Speech enhancement for pathological voice using time-frequency trajectory excitation modeling [paper]
Eunwoo Song, Jongyoub Ryu, Hong-Goo Kang
Proc. APSIPA, 2013, pp. 1–4.