:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guo, Yuchen, Gong, Junli, Dong, Wenjun, Cheung, Yiuming, Su, Weifeng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.06010
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Bringing Multimodal Large Language Models to Infrared-Visible Image Fusion Quality Assessment
by: Guo, Yuchen, et al.
Published: (2026)

Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought
by: Guo, Yuchen, et al.
Published: (2026)

LumiVideo: An Intelligent Agentic System for Video Color Grading
by: Guo, Yuchen, et al.
Published: (2026)

Fuse4Seg: Image Fusion for Multi-Modal Medical Segmentation via Bi-level Optimization
by: Guo, Yuchen, et al.
Published: (2024)

DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion
by: Guo, Yuchen, et al.
Published: (2024)

Inference-Time Diffusion Model Distillation
by: Park, Geon Yeong, et al.
Published: (2024)

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
by: Ran, Lingmin, et al.
Published: (2023)

Segment Any RGB-Thermal Model with Language-aided Distillation
by: Xing, Dong, et al.
Published: (2025)

ChangeDiff: A Multi-Temporal Change Detection Data Generator with Flexible Text Prompts via Diffusion Model
by: Zang, Qi, et al.
Published: (2024)

DiffusionAgent: Navigating Expert Models for Agentic Image Generation
by: Qin, Jie, et al.
Published: (2024)

Real-Time Visual Attribution Streaming in Thinking Model
by: Kang, Seil, et al.
Published: (2026)

Collaborative Few-Step Distillation and Low-Bit Quantization for Wan2.2 Dual-Expert Video Diffusion Models
by: Du, Jinyang, et al.
Published: (2026)

Generative Dataset Distillation Based on Diffusion Model
by: Su, Duo, et al.
Published: (2024)

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
by: Chern, Ethan, et al.
Published: (2025)

HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models
by: Xie, Xin, et al.
Published: (2026)

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
by: Liu, Tao, et al.
Published: (2026)

Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution
by: Zhang, Tianyi, et al.
Published: (2025)

DynaSplat: Dynamic-Static Gaussian Splatting with Hierarchical Motion Decomposition for Scene Reconstruction
by: Deng, Junli, et al.
Published: (2025)

Robust MLLM Unlearning via Visual Knowledge Distillation
by: Wang, Yuhang, et al.
Published: (2025)

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
by: Ji, Yatai, et al.
Published: (2024)

Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture
by: Gao, Weihao, et al.
Published: (2026)

DiffAttn: Diffusion-Based Drivers' Visual Attention Prediction with LLM-Enhanced Semantic Reasoning
by: Liu, Weimin, et al.
Published: (2026)

Diffusion Models Are Real-Time Game Engines
by: Valevski, Dani, et al.
Published: (2024)

GreenEye: Development of Real-Time Traffic Signal Recognition System for Visual Impairments
by: Kim, Danu
Published: (2024)

Foreground-Aware Dataset Distillation via Dynamic Patch Selection
by: Li, Longzhen, et al.
Published: (2026)

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
by: Liu, Junli, et al.
Published: (2025)

PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion
by: He, Xuewan, et al.
Published: (2025)

Accelerating Diffusion Models with One-to-Many Knowledge Distillation
by: Zhang, Linfeng, et al.
Published: (2024)

AnimateDiff-Lightning: Cross-Model Diffusion Distillation
by: Lin, Shanchuan, et al.
Published: (2024)

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
by: Yan, Siming, et al.
Published: (2024)

Illusion-Aware Visual Preprocessing and Anti-Illusion Prompting for Classic Illusion Understanding in Vision-Language Models
by: Zha, Junli, et al.
Published: (2026)

Dynamic Eraser for Guided Concept Erasure in Diffusion Models
by: Gong, Qinghui
Published: (2026)

Adapting VACE for Real-Time Autoregressive Video Diffusion
by: Fosdick, Ryan
Published: (2026)

FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation
by: Zhang, Zherui, et al.
Published: (2025)

SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation
by: Shen, Le, et al.
Published: (2025)

Robotic System with AI for Real Time Weed Detection, Canopy Aware Spraying, and Droplet Pattern Evaluation
by: Rasool, Inayat, et al.
Published: (2025)

LP-LLM: End-to-End Real-World Degraded License Plate Text Recognition via Large Multimodal Models
by: Gong, Haoyan, et al.
Published: (2026)

Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
by: Wang, Haibo, et al.
Published: (2024)

DIFFUMA: High-Fidelity Spatio-Temporal Video Prediction via Dual-Path Mamba and Diffusion Enhancement
by: Xie, Xinyu, et al.
Published: (2025)

One Step Diffusion-based Super-Resolution with Time-Aware Distillation
by: He, Xiao, et al.
Published: (2024)