About Me
I am a master student in School of Software from Beihang University now, supervised by Prof. Lu Sheng.
My current research interests include deep generative models and their applications, with a particular focus on 3D generation. I am very excited about the recent developments in world generation and world models, and can’t wait to dive into them.
I am grateful to all my collaborators and mentors along the way. I first started doing research under the guidance of Prof. Miao Wang. Then I started working on deep learning related projects under the supervision of Prof. Lu Sheng. Besides, I also successively haved intern at MiniMax, Shanghai AI Lab, and VAST, and I’m fortunate to have worked closely with Junting Dong, Yuan-Chen Guo and Yanpei Cao.
🔥 News
- 2024.12: 🎉🎉 New paper MV-Adapter on multi-view synthesis and texture generation is open source.
- 2024.07: 🎉🎉 New paper TELA on clothes disentangled 3D human generation is accepted by ECCV 2024.
- 2024.02: 🎉🎉 New paper EpiDiff on 3D object generation was accepted is accepted by CVPR 2024.
📝 Publications
🧑🎨 3D Generation

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
Zehuan Huang, Yuan-Chen Guo, Xingqiao An, Yunhan Yang, Yangguang Li, Zi-Xin Zou, Ding Liang, Xihui Liu, Yan-Pei Cao✉, Lu Sheng✉
- TL;DR: MIDI is a novel paradigm for image to compositional 3D scene generation, which extends pre-trained image-to-3D object generation models to multi-instance diffusion models for generation of multiple 3D instances.

MV-Adapter: Multi-view Consistent Image Generation Made Easy
Zehuan Huang, Yuan-Chen Guo, Haoran Wang, Ran Yi, Lizhuang Ma, Yan-Pei Cao✉, Lu Sheng✉
- TL;DR: An efficient and versatile adapter that adapts any text-to-image model to generate high-fidelity multi-view images under view/geometry guidance for downstream tasks like 3D generation and texture generation.

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
Hao Wen*, Zehuan Huang*, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng
Project Page | ArXiv 2024
- TL;DR: Transfer the two-stage image-to-3D pipeline into a unified recursive diffusion process, thereby reducing the data bias of each stage and improving the quality of generated 3D.

TELA: Text to Layer-wise 3D Clothed Human Generation
Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai✉
Project Page | ECCV 2024
- TL;DR: A layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothes disentangled 3D human models while providing control capacity for the generation process.

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang*, Hao Wen*, Junting Dong*, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai✉, Lu Sheng✉
Project Page | CVPR 2024
- TL;DR: A localized interactive multi-view diffusion model, that includes epipolar attention blocks to model multi-view consistency.
- Write‐An‐Animation: High‐level Text‐based Animation Editing with Character‐Scene Interaction, Jia-Qi Zhang, Xiang Xu, Zhi-Meng Shen, Ze-Huan Huang, Yang Zhao, Yan-Pei Cao, Pengfei Wan, Miao Wang✉, PG 2021
🎨 Concept Customization

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation
Zehuan Huang*, Hongxing Fan*, Lipeng Wang, Lu Sheng✉
Project Page | ArXiv 2024
- TL;DR: Customize each part of human images for controllable portrait generation.
🎖 Honors and Awards
- 2024.10 China National Scholarship (Top 1%)
- 2024.10 BYD Alumni Scholarship (Top 1%)
- 2024.10 Postgraduate First-Class Scholarship (Top 10%)
- 2023.06 Beijing Outstanding Graduates (Top 1%)
📖 Educations
- 2023.09 - 2026.01 (now), Master, Beihang University, Beijing.
- 2019.09 - 2023.06, Undergraduate, School of Software, Beihang University, Beijing.
💻 Internships
- 2023.12 - Present, VAST, Beijing. Working on 3D generation and texture generation.
- 2023.08 - 2023.12, Shanghai Artificial Intelligence Laboratory, Beijing. Working on 3D generation.
- 2022.05 - 2023.06, MiniMax, Beijing. Working on 3D avatar reconstruction, controllable image generation.
💁 Services
Reviewers
- Conference: CVPR 2025; ICLR 2025
- Journal: TCSVT