About Me

I am a master student in School of Software from Beihang University now, supervised by Prof. Lu Sheng.

My current research interests include deep generative models and their applications, with a particular focus on 3D generation. I am very excited about the recent developments in world generation and world models, and can’t wait to dive into them.

I am grateful to all my collaborators and mentors along the way. I first started doing research under the guidance of Prof. Miao Wang. Then I started working on deep learning related projects under the supervision of Prof. Lu Sheng. Besides, I also successively haved intern at MiniMax, Shanghai AI Lab, and VAST, and I’m fortunate to have worked closely with Junting Dong, Yuan-Chen Guo and Yanpei Cao.

🔥 News

  • 2024.12:  🎉🎉 New paper MV-Adapter on multi-view synthesis and texture generation is open source.
  • 2024.07:  🎉🎉 New paper TELA on clothes disentangled 3D human generation is accepted by ECCV 2024.
  • 2024.02:  🎉🎉 New paper EpiDiff on 3D object generation was accepted is accepted by CVPR 2024.

📝 Publications

🧑‍🎨 3D Generation

CoRR 2024
sym

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Zehuan Huang, Yuan-Chen Guo, Xingqiao An, Yunhan Yang, Yangguang Li, Zi-Xin Zou, Ding Liang, Xihui Liu, Yan-Pei Cao✉, Lu Sheng✉

Project Page | Code

  • TL;DR: MIDI is a novel paradigm for image to compositional 3D scene generation, which extends pre-trained image-to-3D object generation models to multi-instance diffusion models for generation of multiple 3D instances.
CoRR 2024
sym

MV-Adapter: Multi-view Consistent Image Generation Made Easy

Zehuan Huang, Yuan-Chen Guo, Haoran Wang, Ran Yi, Lizhuang Ma, Yan-Pei Cao✉, Lu Sheng✉

Project Page | Code

  • TL;DR: An efficient and versatile adapter that adapts any text-to-image model to generate high-fidelity multi-view images under view/geometry guidance for downstream tasks like 3D generation and texture generation.
CoRR 2024
sym

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Hao Wen*, Zehuan Huang*, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng

Project Page | ArXiv 2024

  • TL;DR: Transfer the two-stage image-to-3D pipeline into a unified recursive diffusion process, thereby reducing the data bias of each stage and improving the quality of generated 3D.
ECCV 2024
sym

TELA: Text to Layer-wise 3D Clothed Human Generation

Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, Bo Dai✉

Project Page | ECCV 2024

  • TL;DR: A layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothes disentangled 3D human models while providing control capacity for the generation process.
CVPR 2024
sym

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

Zehuan Huang*, Hao Wen*, Junting Dong*, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai✉, Lu Sheng✉

Project Page | CVPR 2024

  • TL;DR: A localized interactive multi-view diffusion model, that includes epipolar attention blocks to model multi-view consistency.

🎨 Concept Customization

CoRR 2024
sym

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

Zehuan Huang*, Hongxing Fan*, Lipeng Wang, Lu Sheng✉

Project Page | ArXiv 2024

  • TL;DR: Customize each part of human images for controllable portrait generation.

🎖 Honors and Awards

  • 2024.10 China National Scholarship (Top 1%)
  • 2024.10 BYD Alumni Scholarship (Top 1%)
  • 2024.10 Postgraduate First-Class Scholarship (Top 10%)
  • 2023.06 Beijing Outstanding Graduates (Top 1%)

📖 Educations

  • 2023.09 - 2026.01 (now), Master, Beihang University, Beijing.
  • 2019.09 - 2023.06, Undergraduate, School of Software, Beihang University, Beijing.

💻 Internships

  • 2023.12 - Present, VAST, Beijing. Working on 3D generation and texture generation.
  • 2023.08 - 2023.12, Shanghai Artificial Intelligence Laboratory, Beijing. Working on 3D generation.
  • 2022.05 - 2023.06, MiniMax, Beijing. Working on 3D avatar reconstruction, controllable image generation.

💁 Services

Reviewers

  • Conference: CVPR 2025; ICLR 2025
  • Journal: TCSVT