Saved in:
| Main Authors: | Guo, Yuchen, Gong, Junli, Dong, Wenjun, Cheung, Yiuming, Su, Weifeng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.06010 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bringing Multimodal Large Language Models to Infrared-Visible Image Fusion Quality Assessment
by: Guo, Yuchen, et al.
Published: (2026)
by: Guo, Yuchen, et al.
Published: (2026)
Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought
by: Guo, Yuchen, et al.
Published: (2026)
by: Guo, Yuchen, et al.
Published: (2026)
LumiVideo: An Intelligent Agentic System for Video Color Grading
by: Guo, Yuchen, et al.
Published: (2026)
by: Guo, Yuchen, et al.
Published: (2026)
Fuse4Seg: Image Fusion for Multi-Modal Medical Segmentation via Bi-level Optimization
by: Guo, Yuchen, et al.
Published: (2024)
by: Guo, Yuchen, et al.
Published: (2024)
DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion
by: Guo, Yuchen, et al.
Published: (2024)
by: Guo, Yuchen, et al.
Published: (2024)
Inference-Time Diffusion Model Distillation
by: Park, Geon Yeong, et al.
Published: (2024)
by: Park, Geon Yeong, et al.
Published: (2024)
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
by: Ran, Lingmin, et al.
Published: (2023)
by: Ran, Lingmin, et al.
Published: (2023)
Segment Any RGB-Thermal Model with Language-aided Distillation
by: Xing, Dong, et al.
Published: (2025)
by: Xing, Dong, et al.
Published: (2025)
ChangeDiff: A Multi-Temporal Change Detection Data Generator with Flexible Text Prompts via Diffusion Model
by: Zang, Qi, et al.
Published: (2024)
by: Zang, Qi, et al.
Published: (2024)
DiffusionAgent: Navigating Expert Models for Agentic Image Generation
by: Qin, Jie, et al.
Published: (2024)
by: Qin, Jie, et al.
Published: (2024)
Real-Time Visual Attribution Streaming in Thinking Model
by: Kang, Seil, et al.
Published: (2026)
by: Kang, Seil, et al.
Published: (2026)
Collaborative Few-Step Distillation and Low-Bit Quantization for Wan2.2 Dual-Expert Video Diffusion Models
by: Du, Jinyang, et al.
Published: (2026)
by: Du, Jinyang, et al.
Published: (2026)
Generative Dataset Distillation Based on Diffusion Model
by: Su, Duo, et al.
Published: (2024)
by: Su, Duo, et al.
Published: (2024)
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
by: Chern, Ethan, et al.
Published: (2025)
by: Chern, Ethan, et al.
Published: (2025)
HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models
by: Xie, Xin, et al.
Published: (2026)
by: Xie, Xin, et al.
Published: (2026)
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
by: Liu, Tao, et al.
Published: (2026)
by: Liu, Tao, et al.
Published: (2026)
Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution
by: Zhang, Tianyi, et al.
Published: (2025)
by: Zhang, Tianyi, et al.
Published: (2025)
DynaSplat: Dynamic-Static Gaussian Splatting with Hierarchical Motion Decomposition for Scene Reconstruction
by: Deng, Junli, et al.
Published: (2025)
by: Deng, Junli, et al.
Published: (2025)
Robust MLLM Unlearning via Visual Knowledge Distillation
by: Wang, Yuhang, et al.
Published: (2025)
by: Wang, Yuhang, et al.
Published: (2025)
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
by: Ji, Yatai, et al.
Published: (2024)
by: Ji, Yatai, et al.
Published: (2024)
Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture
by: Gao, Weihao, et al.
Published: (2026)
by: Gao, Weihao, et al.
Published: (2026)
DiffAttn: Diffusion-Based Drivers' Visual Attention Prediction with LLM-Enhanced Semantic Reasoning
by: Liu, Weimin, et al.
Published: (2026)
by: Liu, Weimin, et al.
Published: (2026)
Diffusion Models Are Real-Time Game Engines
by: Valevski, Dani, et al.
Published: (2024)
by: Valevski, Dani, et al.
Published: (2024)
GreenEye: Development of Real-Time Traffic Signal Recognition System for Visual Impairments
by: Kim, Danu
Published: (2024)
by: Kim, Danu
Published: (2024)
Foreground-Aware Dataset Distillation via Dynamic Patch Selection
by: Li, Longzhen, et al.
Published: (2026)
by: Li, Longzhen, et al.
Published: (2026)
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
by: Liu, Junli, et al.
Published: (2025)
by: Liu, Junli, et al.
Published: (2025)
PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion
by: He, Xuewan, et al.
Published: (2025)
by: He, Xuewan, et al.
Published: (2025)
Accelerating Diffusion Models with One-to-Many Knowledge Distillation
by: Zhang, Linfeng, et al.
Published: (2024)
by: Zhang, Linfeng, et al.
Published: (2024)
AnimateDiff-Lightning: Cross-Model Diffusion Distillation
by: Lin, Shanchuan, et al.
Published: (2024)
by: Lin, Shanchuan, et al.
Published: (2024)
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
by: Yan, Siming, et al.
Published: (2024)
by: Yan, Siming, et al.
Published: (2024)
Illusion-Aware Visual Preprocessing and Anti-Illusion Prompting for Classic Illusion Understanding in Vision-Language Models
by: Zha, Junli, et al.
Published: (2026)
by: Zha, Junli, et al.
Published: (2026)
Dynamic Eraser for Guided Concept Erasure in Diffusion Models
by: Gong, Qinghui
Published: (2026)
by: Gong, Qinghui
Published: (2026)
Adapting VACE for Real-Time Autoregressive Video Diffusion
by: Fosdick, Ryan
Published: (2026)
by: Fosdick, Ryan
Published: (2026)
FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation
by: Zhang, Zherui, et al.
Published: (2025)
by: Zhang, Zherui, et al.
Published: (2025)
SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation
by: Shen, Le, et al.
Published: (2025)
by: Shen, Le, et al.
Published: (2025)
Robotic System with AI for Real Time Weed Detection, Canopy Aware Spraying, and Droplet Pattern Evaluation
by: Rasool, Inayat, et al.
Published: (2025)
by: Rasool, Inayat, et al.
Published: (2025)
LP-LLM: End-to-End Real-World Degraded License Plate Text Recognition via Large Multimodal Models
by: Gong, Haoyan, et al.
Published: (2026)
by: Gong, Haoyan, et al.
Published: (2026)
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
by: Wang, Haibo, et al.
Published: (2024)
by: Wang, Haibo, et al.
Published: (2024)
DIFFUMA: High-Fidelity Spatio-Temporal Video Prediction via Dual-Path Mamba and Diffusion Enhancement
by: Xie, Xinyu, et al.
Published: (2025)
by: Xie, Xinyu, et al.
Published: (2025)
One Step Diffusion-based Super-Resolution with Time-Aware Distillation
by: He, Xiao, et al.
Published: (2024)
by: He, Xiao, et al.
Published: (2024)
Similar Items
-
Bringing Multimodal Large Language Models to Infrared-Visible Image Fusion Quality Assessment
by: Guo, Yuchen, et al.
Published: (2026) -
Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought
by: Guo, Yuchen, et al.
Published: (2026) -
LumiVideo: An Intelligent Agentic System for Video Color Grading
by: Guo, Yuchen, et al.
Published: (2026) -
Fuse4Seg: Image Fusion for Multi-Modal Medical Segmentation via Bi-level Optimization
by: Guo, Yuchen, et al.
Published: (2024) -
DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion
by: Guo, Yuchen, et al.
Published: (2024)