Saved in:
| Main Authors: | Liang, Xiao, Zhang, Yunzhu, Zhu, Linchao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01814 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
by: Zhang, Yunzhu, et al.
Published: (2025)
by: Zhang, Yunzhu, et al.
Published: (2025)
MVP: Multiple View Prediction Improves GUI Grounding
by: Zhang, Yunzhu, et al.
Published: (2025)
by: Zhang, Yunzhu, et al.
Published: (2025)
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
by: Lu, Yu, et al.
Published: (2024)
by: Lu, Yu, et al.
Published: (2024)
AudioScenic: Audio-Driven Video Scene Editing
by: Shen, Kaixin, et al.
Published: (2024)
by: Shen, Kaixin, et al.
Published: (2024)
Combating Label Noise With A General Surrogate Model For Sample Selection
by: Liang, Chao, et al.
Published: (2023)
by: Liang, Chao, et al.
Published: (2023)
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
by: Yang, Xiangpeng, et al.
Published: (2025)
by: Yang, Xiangpeng, et al.
Published: (2025)
EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
by: Yang, Xiangpeng, et al.
Published: (2024)
by: Yang, Xiangpeng, et al.
Published: (2024)
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
by: Yang, Xiangpeng, et al.
Published: (2024)
by: Yang, Xiangpeng, et al.
Published: (2024)
GPD-1: Generative Pre-training for Driving
by: Xie, Zixun, et al.
Published: (2024)
by: Xie, Zixun, et al.
Published: (2024)
MTC-VAE: Multi-Level Temporal Compression with Content Awareness
by: Dong, Yubo, et al.
Published: (2026)
by: Dong, Yubo, et al.
Published: (2026)
High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation
by: Dong, Le, et al.
Published: (2025)
by: Dong, Le, et al.
Published: (2025)
Stable Score Distillation for High-Quality 3D Generation
by: Tang, Boshi, et al.
Published: (2023)
by: Tang, Boshi, et al.
Published: (2023)
Artifact-Aware Evaluation for High-Quality Video Generation
by: Zhu, Chen, et al.
Published: (2026)
by: Zhu, Chen, et al.
Published: (2026)
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
by: Xu, Yunqiu, et al.
Published: (2024)
by: Xu, Yunqiu, et al.
Published: (2024)
H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction
by: Jia, Heng, et al.
Published: (2025)
by: Jia, Heng, et al.
Published: (2025)
Collaborative Group: Composed Image Retrieval via Consensus Learning from Noisy Annotations
by: Zhang, Xu, et al.
Published: (2023)
by: Zhang, Xu, et al.
Published: (2023)
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
by: Zhao, Shuai, et al.
Published: (2023)
by: Zhao, Shuai, et al.
Published: (2023)
Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
by: Zi, Bojia, et al.
Published: (2025)
by: Zi, Bojia, et al.
Published: (2025)
SeedEdit 3.0: Fast and High-Quality Generative Image Editing
by: Wang, Peng, et al.
Published: (2025)
by: Wang, Peng, et al.
Published: (2025)
DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh
by: Zhuang, Jingyu, et al.
Published: (2024)
by: Zhuang, Jingyu, et al.
Published: (2024)
Transition Matching Distillation for Fast Video Generation
by: Nie, Weili, et al.
Published: (2026)
by: Nie, Weili, et al.
Published: (2026)
Spectral Progressive Diffusion for Efficient Image and Video Generation
by: Xiao, Howard, et al.
Published: (2026)
by: Xiao, Howard, et al.
Published: (2026)
Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image
by: Gao, Yujie, et al.
Published: (2026)
by: Gao, Yujie, et al.
Published: (2026)
VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning
by: Zhang, Xuanyu, et al.
Published: (2025)
by: Zhang, Xuanyu, et al.
Published: (2025)
Slimmable Networks for Contrastive Self-supervised Learning
by: Zhao, Shuai, et al.
Published: (2022)
by: Zhao, Shuai, et al.
Published: (2022)
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
by: Zhao, Shuai, et al.
Published: (2023)
by: Zhao, Shuai, et al.
Published: (2023)
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
by: Suo, Yucheng, et al.
Published: (2024)
by: Suo, Yucheng, et al.
Published: (2024)
Progressive Class-level Distillation
by: Li, Jiayan, et al.
Published: (2025)
by: Li, Jiayan, et al.
Published: (2025)
3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization
by: Hao, Yuze, et al.
Published: (2025)
by: Hao, Yuze, et al.
Published: (2025)
Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos
by: Zhang, Kaifeng, et al.
Published: (2025)
by: Zhang, Kaifeng, et al.
Published: (2025)
Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models
by: Zhu, Beier, et al.
Published: (2025)
by: Zhu, Beier, et al.
Published: (2025)
Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection
by: Yao, Liang, et al.
Published: (2024)
by: Yao, Liang, et al.
Published: (2024)
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
by: Lu, Yunhong, et al.
Published: (2025)
by: Lu, Yunhong, et al.
Published: (2025)
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
by: Zhao, Min, et al.
Published: (2026)
by: Zhao, Min, et al.
Published: (2026)
VSD-MOT: End-to-End Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Distillation
by: Du, Jun
Published: (2026)
by: Du, Jun
Published: (2026)
Noise-Tolerant Hybrid Prototypical Learning with Noisy Web Data
by: Liang, Chao, et al.
Published: (2025)
by: Liang, Chao, et al.
Published: (2025)
CapHuman: Capture Your Moments in Parallel Universes
by: Liang, Chao, et al.
Published: (2024)
by: Liang, Chao, et al.
Published: (2024)
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
by: Patel, Shivansh, et al.
Published: (2025)
by: Patel, Shivansh, et al.
Published: (2025)
OSV: One Step is Enough for High-Quality Image to Video Generation
by: Mao, Xiaofeng, et al.
Published: (2024)
by: Mao, Xiaofeng, et al.
Published: (2024)
TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution
by: Ran, Fengli, et al.
Published: (2025)
by: Ran, Fengli, et al.
Published: (2025)
Similar Items
-
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
by: Zhang, Yunzhu, et al.
Published: (2025) -
MVP: Multiple View Prediction Improves GUI Grounding
by: Zhang, Yunzhu, et al.
Published: (2025) -
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
by: Lu, Yu, et al.
Published: (2024) -
AudioScenic: Audio-Driven Video Scene Editing
by: Shen, Kaixin, et al.
Published: (2024) -
Combating Label Noise With A General Surrogate Model For Sample Selection
by: Liang, Chao, et al.
Published: (2023)