:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Xiao, Zhang, Yunzhu, Zhu, Linchao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.01814
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
by: Zhang, Yunzhu, et al.
Published: (2025)

MVP: Multiple View Prediction Improves GUI Grounding
by: Zhang, Yunzhu, et al.
Published: (2025)

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
by: Lu, Yu, et al.
Published: (2024)

AudioScenic: Audio-Driven Video Scene Editing
by: Shen, Kaixin, et al.
Published: (2024)

Combating Label Noise With A General Surrogate Model For Sample Selection
by: Liang, Chao, et al.
Published: (2023)

VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
by: Yang, Xiangpeng, et al.
Published: (2025)

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
by: Yang, Xiangpeng, et al.
Published: (2024)

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
by: Yang, Xiangpeng, et al.
Published: (2024)

GPD-1: Generative Pre-training for Driving
by: Xie, Zixun, et al.
Published: (2024)

MTC-VAE: Multi-Level Temporal Compression with Content Awareness
by: Dong, Yubo, et al.
Published: (2026)

High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation
by: Dong, Le, et al.
Published: (2025)

Stable Score Distillation for High-Quality 3D Generation
by: Tang, Boshi, et al.
Published: (2023)

Artifact-Aware Evaluation for High-Quality Video Generation
by: Zhu, Chen, et al.
Published: (2026)

MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
by: Xu, Yunqiu, et al.
Published: (2024)

H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction
by: Jia, Heng, et al.
Published: (2025)

Collaborative Group: Composed Image Retrieval via Consensus Learning from Noisy Annotations
by: Zhang, Xu, et al.
Published: (2023)

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
by: Zhao, Shuai, et al.
Published: (2023)

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
by: Zi, Bojia, et al.
Published: (2025)

SeedEdit 3.0: Fast and High-Quality Generative Image Editing
by: Wang, Peng, et al.
Published: (2025)

DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh
by: Zhuang, Jingyu, et al.
Published: (2024)

Transition Matching Distillation for Fast Video Generation
by: Nie, Weili, et al.
Published: (2026)

Spectral Progressive Diffusion for Efficient Image and Video Generation
by: Xiao, Howard, et al.
Published: (2026)

Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image
by: Gao, Yujie, et al.
Published: (2026)

VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning
by: Zhang, Xuanyu, et al.
Published: (2025)

Slimmable Networks for Contrastive Self-supervised Learning
by: Zhao, Shuai, et al.
Published: (2022)

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
by: Zhao, Shuai, et al.
Published: (2023)

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
by: Suo, Yucheng, et al.
Published: (2024)

Progressive Class-level Distillation
by: Li, Jiayan, et al.
Published: (2025)

3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization
by: Hao, Yuze, et al.
Published: (2025)

Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos
by: Zhang, Kaifeng, et al.
Published: (2025)

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models
by: Zhu, Beier, et al.
Published: (2025)

Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection
by: Yao, Liang, et al.
Published: (2024)

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
by: Lu, Yunhong, et al.
Published: (2025)

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
by: Zhao, Min, et al.
Published: (2026)

VSD-MOT: End-to-End Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Distillation
by: Du, Jun
Published: (2026)

Noise-Tolerant Hybrid Prototypical Learning with Noisy Web Data
by: Liang, Chao, et al.
Published: (2025)

CapHuman: Capture Your Moments in Parallel Universes
by: Liang, Chao, et al.
Published: (2024)

Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
by: Patel, Shivansh, et al.
Published: (2025)

OSV: One Step is Enough for High-Quality Image to Video Generation
by: Mao, Xiaofeng, et al.
Published: (2024)

TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution
by: Ran, Fengli, et al.
Published: (2025)