Personal Homepage

Jiaye Li

I am a master's student at Fudan University, interested in computer vision and generative AI, advised by Prof. Siyu Zhu.

My research focuses on visual generative models, with a specific emphasis on diffusion and flow-based image and video generation, human-centric video generation, and model post-training techniques, including distillation and reinforcement learning.

Research

Interests

01

Visual Generative Models

Diffusion and flow-based models for high-quality image and video generation.

02

Human-Centric Video Generation

Human-focused video generation and data pipelines for realistic and temporally consistent generation.

03

Model Post-Training

Distillation and reinforcement learning for improving generative model efficiency and alignment.

Selected Work

Projects & Publications

HARoPE paper teaser First Author

Image Generation / CVPR 2026

Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation

Jiaye Li*, Baoyou Chen*, Hui Li, Zilong Dong, Jingdong Wang, Siyu Zhu

HARoPE introduces a head-wise adaptive extension of RoPE for transformer-based image generation, improving fine-grained spatial relations, color fidelity, and object counting in class-conditional and text-to-image settings.

Hallo-Live paper teaser Co-First Author

Avatar Generation / ACMMM 2026

Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation

Chunyu Li*, Jiaye Li*, Ruiqiao Mei, Haoyuan Xia, Hao Zhu, Jingdong Wang, Siyu Zhu

Hallo-Live is a real-time text-driven audio-video avatar generation framework that combines asynchronous dual-stream diffusion with human-centric preference-guided distillation for low-latency, synchronized portrait video and speech synthesis.

OpenHumanVid paper teaser Co-Author

Human Video / CVPR 2025

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li*, Mingwang Xu*, Yun Zhan, Shan Mu, Jiaye Li, Kaihui Cheng, Yuxuan Chen, Tan Chen, Mao Ye, Jingdong Wang, Siyu Zhu

OpenHumanVid provides a large-scale, high-quality human-centric video dataset with detailed captions, skeleton sequences, and speech audio to improve human video generation and motion alignment.

PPFlow paper teaser Co-Author

Visual Generation / ICLR 2026

Pyramidal Patchification Flow for Visual Generation

Hui Li, Baoyou Chen, Liwei Zhang, Jiaye Li, Jingdong Wang, Siyu Zhu

PPFlow accelerates diffusion and flow-based visual generation by reducing token counts at high-noise timesteps through pyramidal patchification, while preserving generation quality.

LongD-CLIP paper teaser Co-Author

Vision-Language / CVPR 2025

Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation

Yuheng Feng*, Changsong Wen*, Zelin Peng, Jiaye Li, Siyu Zhu

LongD-CLIP uses dual-teacher distillation to improve CLIP's long-text representation ability while retaining foundational short-text and zero-shot classification knowledge.

Background

Education & Experience

  1. Research Intern at the Shanghai Academy of AI for Science.

  2. Master's student at Fudan University.

  3. Bachelor's degree in Artificial Intelligence, Nanjing University of Aeronautics and Astronautics.

Contact

Get in touch

For research collaboration, project questions, or academic discussions, feel free to reach out by email or GitHub.