:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cai, Yufei, Han, Hu, Wei, Yuxiang, Shan, Shiguang, Chen, Xilin
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.19369
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models
by: Wang, Zhongqi, et al.
Published: (2025)

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
by: Wang, Zhongqi, et al.
Published: (2024)

T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
by: Li, Changzhen, et al.
Published: (2025)

Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models
by: Zhang, Jie, et al.
Published: (2025)

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
by: Xu, Yifeng, et al.
Published: (2024)

FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers
by: Yuan, Zheng, et al.
Published: (2024)

EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation
by: Hou, Ruibing, et al.
Published: (2026)

DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks
by: Li, Yinqi, et al.
Published: (2025)

Task-adaptive Q-Face
by: Sun, Haomiao, et al.
Published: (2024)

Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
by: Wang, Zhongqi, et al.
Published: (2025)

VOPE: Revisiting Hallucination of Vision-Language Models in Voluntary Imagination Task
by: Long, Xingming, et al.
Published: (2025)

Towards Transferable Defense Against Malicious Image Edits
by: Zhang, Jie, et al.
Published: (2025)

Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
by: Yuan, Xin, et al.
Published: (2024)

GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition
by: Wang, Tianyue, et al.
Published: (2025)

Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection
by: Long, Xingming, et al.
Published: (2024)

Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
by: Long, Xingming, et al.
Published: (2024)

Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
by: Ma, Yue, et al.
Published: (2025)

UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models
by: Liang, Jiachen, et al.
Published: (2024)

INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs
by: Yang, Junqi, et al.
Published: (2026)

EntropyScan: Towards Model-level Backdoor Detection in LVLMs via Visual Attention Entropy
by: Ge, Xuanyu, et al.
Published: (2026)

What Makes VLMs Robust? Towards Reconciling Robustness and Accuracy in Vision-Language Models
by: Nie, Sen, et al.
Published: (2026)

Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation
by: Liang, Jiachen, et al.
Published: (2024)

MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)

Revisiting Logit Distributions for Reliable Out-of-Distribution Detection
by: Liang, Jiachen, et al.
Published: (2025)

Contrastive Spectral Rectification: Test-Time Defense towards Zero-shot Adversarial Robustness of CLIP
by: Nie, Sen, et al.
Published: (2026)

Component-Based Out-of-Distribution Detection
by: Liu, Wenrui, et al.
Published: (2026)

Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement
by: Yuan, Zheng, et al.
Published: (2024)

V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
by: Nie, Sen, et al.
Published: (2025)

ACT Now: Preempting LVLM Hallucinations via Adaptive Context Integration
by: Yan, Bei, et al.
Published: (2026)

Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating
by: Cao, Xiangkui, et al.
Published: (2026)

Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)

Dual Attention Guided Defense Against Malicious Edits
by: Zhang, Jie, et al.
Published: (2025)

Jodi: Unification of Visual Generation and Understanding via Joint Modeling
by: Xu, Yifeng, et al.
Published: (2025)

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
by: Li, Yiheng, et al.
Published: (2024)

MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models
by: Li, Xiaomin, et al.
Published: (2024)

un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP
by: Li, Yinqi, et al.
Published: (2025)

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention
by: Tang, Xiaolong, et al.
Published: (2024)

InstaVSR: Taming Diffusion for Efficient and Temporally Consistent Video Super-Resolution
by: Hu, Jintong, et al.
Published: (2026)

$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
by: Liu, Ye, et al.
Published: (2024)

Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
by: Luo, Songtao, et al.
Published: (2023)