I am a PhD candidate at the College of Computer Science and Technology, Zhejiang University (浙江大学计算机学院).
I work on the Audio Research Team at Zhejiang University, under the supervision of Prof. Zhou Zhao (赵洲). Previously, I graduated from Chu Kochen Honors College, Zhejiang University (浙江大学竺可桢学院), with dual bachelor's degrees in Computer Science and Automation. I have also served as a visiting scholar at University of Rochester with Prof. Zhiyao Duan and University of Massachusetts Amherst with Prof. Przemyslaw Grabowicz.
My research interests primarily focus on Multi-Modal Generative AI, specifically in Spatial Audio, Music, Singing, and Speech. I have published first-author papers at top international AI conferences, including NeurIPS, ACL, AAAI, and EMNLP. Currently, I am working on spatial audio generation with multimodal prompts and streaming voice conversion.
I am actively seeking research collaborations. Please feel free to contact me via email at [email protected].
- Personal Pages: https://aaronz345.github.io (updated recently🔥)
- Linkedin: www.linkedin.com/in/yuzhang34
- Google Scholar: https://scholar.google.com/citations?user=kA9A6LsAAAAJ
- DBLP: https://dblp.org/pid/50/671-126.html
*denotes co-first authors
Preprint
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting, Yu Zhang, Wenxiang Guo, Changhao Pan, et al.
Preprint
Versatile Framework for Song Generation with Prompt-based Control, Yu Zhang, Wenxiang Guo, Changhao Pan, et al.
ACL 2025
TCSinger 2: Customizable Multilingual Singing Voice Synthesis, Yu Zhang, Wenxiang Guo, Changhao Pan, et al.EMNLP 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control, Yu Zhang, Ziyue Jiang, Ruiqi Li, et al.NeurIPS 2024 Spotlight
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks, Yu Zhang, Changhao Pan, Wenxinag Guo, et al.AAAI 2024
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis, Yu Zhang, Rongjie Huang, Ruiqi Li, et al.ACL 2025
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation, Wenxiang Guo*, Yu Zhang*, Changhao Pan*, et al.AAAI 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching, Wenxiang Guo, Yu Zhang, Changhao Pan, et al.ACL 2024
Robust Singing Voice Transcription Serves Synthesis, Ruiqi Li, Yu Zhang, Yongqi Wang, et al.
Preprint
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis, Ziyue Jiang, Yi Ren, Ruiqi Li, Shengpeng Ji, Zhenhui Ye, Chen Zhang, Bai Jionghao, Xiaoda Yang, Jialong Zuo, Yu Zhang, et al.