:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yan, Weicai, Ma, Xinhua, Lin, Wang, Jin, Tao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2605.08181
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient Prompting for Continual Adaptation to Missing Modalities
by: Guo, Zirun, et al.
Published: (2025)

Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning
by: Liu, Yuti, et al.
Published: (2024)

Representation Surgery for Multi-Task Model Merging
by: Yang, Enneng, et al.
Published: (2024)

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution
by: Wang, Ying, et al.
Published: (2023)

Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach
by: Yang, Yaoxin, et al.
Published: (2025)

M4V: Multi-Modal Mamba for Text-to-Video Generation
by: Huang, Jiancheng, et al.
Published: (2025)

Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference
by: Han, Yunchu, et al.
Published: (2025)

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
by: Xu, Sihan, et al.
Published: (2023)

Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation
by: Xie, Jingjing, et al.
Published: (2024)

Uncertainty-Guided Selective Adaptation Enables Cross-Platform Predictive Fluorescence Microscopy
by: Yang, Kai-Wen K., et al.
Published: (2025)

Text-to-Image GAN with Pretrained Representations
by: You, Xiaozhou, et al.
Published: (2024)

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
by: Zhu, Haoyi, et al.
Published: (2024)

Integrating Frequency Guidance into Multi-source Domain Generalization for Bearing Fault Diagnosis
by: Tu, Xiaotong, et al.
Published: (2025)

Scaling 4D Representations
by: Carreira, João, et al.
Published: (2024)

DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi Text
by: Singh, Kunwar Arpit, et al.
Published: (2026)

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking
by: Wang, Shiao, et al.
Published: (2026)

Semantically Guided Representation Learning For Action Anticipation
by: Diko, Anxhelo, et al.
Published: (2024)

SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery
by: Yang, Enneng, et al.
Published: (2024)

MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
by: Mao, Jiawei, et al.
Published: (2025)

Invariant Representation Guided Multimodal Sentiment Decoding with Sequential Variation Regularization
by: Xu, Guoyang, et al.
Published: (2024)

Compositional Text-to-Image Generation with Dense Blob Representations
by: Nie, Weili, et al.
Published: (2024)

Source-Free Domain Adaptation with Diffusion-Guided Source Data Generation
by: Chopra, Shivang, et al.
Published: (2024)

MFAF: An EVA02-Based Multi-scale Frequency Attention Fusion Method for Cross-View Geo-Localization
by: Liu, YiTong, et al.
Published: (2025)

Implicit Contrastive Representation Learning with Guided Stop-gradient
by: Lee, Byeongchan, et al.
Published: (2025)

Superclass-Guided Representation Disentanglement for Spurious Correlation Mitigation
by: Liu, Chenruo, et al.
Published: (2025)

ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts
by: Petrov, Dmitry, et al.
Published: (2024)

Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning
by: Zhang, Yan, et al.
Published: (2024)

Consistent Flow Distillation for Text-to-3D Generation
by: Yan, Runjie, et al.
Published: (2025)

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules
by: Chen, Xiangyu, et al.
Published: (2024)

Contextualized Diffusion Models for Text-Guided Image and Video Generation
by: Yang, Ling, et al.
Published: (2024)

DenseTRF: Texture-Aware Unsupervised Representation Adaptation for Surgical Scene Dense Prediction
by: Liao, Guiqiu, et al.
Published: (2026)

A Survey on Cache Methods in Diffusion Models: Toward Efficient Multi-Modal Generation
by: Liu, Jiacheng, et al.
Published: (2025)

Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation
by: Yan, HongWei, et al.
Published: (2024)

Exploring Text-to-Motion Generation with Human Preference
by: Sheng, Jenny, et al.
Published: (2024)

Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
by: Ma, Huan, et al.
Published: (2024)

Alignment-Guided Score Matching for Text-to-Image Alignment in Diffusion Models
by: Lee, Jaa-Yeon, et al.
Published: (2026)

FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks
by: Yang, Hunmin, et al.
Published: (2024)

EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits
by: Yosef, Ron, et al.
Published: (2025)

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
by: Oertell, Owen, et al.
Published: (2024)

MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings
by: Li, Zijie, et al.
Published: (2026)