Saved in:
| Main Authors: | Cai, Yufei, Han, Hu, Wei, Yuxiang, Shan, Shiguang, Chen, Xilin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.19369 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models
by: Wang, Zhongqi, et al.
Published: (2025)
by: Wang, Zhongqi, et al.
Published: (2025)
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
by: Wang, Zhongqi, et al.
Published: (2024)
by: Wang, Zhongqi, et al.
Published: (2024)
T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
by: Li, Changzhen, et al.
Published: (2025)
by: Li, Changzhen, et al.
Published: (2025)
Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models
by: Zhang, Jie, et al.
Published: (2025)
by: Zhang, Jie, et al.
Published: (2025)
CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
by: Xu, Yifeng, et al.
Published: (2024)
by: Xu, Yifeng, et al.
Published: (2024)
FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers
by: Yuan, Zheng, et al.
Published: (2024)
by: Yuan, Zheng, et al.
Published: (2024)
EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation
by: Hou, Ruibing, et al.
Published: (2026)
by: Hou, Ruibing, et al.
Published: (2026)
DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks
by: Li, Yinqi, et al.
Published: (2025)
by: Li, Yinqi, et al.
Published: (2025)
Task-adaptive Q-Face
by: Sun, Haomiao, et al.
Published: (2024)
by: Sun, Haomiao, et al.
Published: (2024)
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
by: Wang, Zhongqi, et al.
Published: (2025)
by: Wang, Zhongqi, et al.
Published: (2025)
VOPE: Revisiting Hallucination of Vision-Language Models in Voluntary Imagination Task
by: Long, Xingming, et al.
Published: (2025)
by: Long, Xingming, et al.
Published: (2025)
Towards Transferable Defense Against Malicious Image Edits
by: Zhang, Jie, et al.
Published: (2025)
by: Zhang, Jie, et al.
Published: (2025)
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
by: Yuan, Xin, et al.
Published: (2024)
by: Yuan, Xin, et al.
Published: (2024)
GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition
by: Wang, Tianyue, et al.
Published: (2025)
by: Wang, Tianyue, et al.
Published: (2025)
Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection
by: Long, Xingming, et al.
Published: (2024)
by: Long, Xingming, et al.
Published: (2024)
Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
by: Long, Xingming, et al.
Published: (2024)
by: Long, Xingming, et al.
Published: (2024)
Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
by: Ma, Yue, et al.
Published: (2025)
by: Ma, Yue, et al.
Published: (2025)
UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models
by: Liang, Jiachen, et al.
Published: (2024)
by: Liang, Jiachen, et al.
Published: (2024)
INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs
by: Yang, Junqi, et al.
Published: (2026)
by: Yang, Junqi, et al.
Published: (2026)
EntropyScan: Towards Model-level Backdoor Detection in LVLMs via Visual Attention Entropy
by: Ge, Xuanyu, et al.
Published: (2026)
by: Ge, Xuanyu, et al.
Published: (2026)
What Makes VLMs Robust? Towards Reconciling Robustness and Accuracy in Vision-Language Models
by: Nie, Sen, et al.
Published: (2026)
by: Nie, Sen, et al.
Published: (2026)
Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation
by: Liang, Jiachen, et al.
Published: (2024)
by: Liang, Jiachen, et al.
Published: (2024)
MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)
by: Yan, Bei, et al.
Published: (2024)
Revisiting Logit Distributions for Reliable Out-of-Distribution Detection
by: Liang, Jiachen, et al.
Published: (2025)
by: Liang, Jiachen, et al.
Published: (2025)
Contrastive Spectral Rectification: Test-Time Defense towards Zero-shot Adversarial Robustness of CLIP
by: Nie, Sen, et al.
Published: (2026)
by: Nie, Sen, et al.
Published: (2026)
Component-Based Out-of-Distribution Detection
by: Liu, Wenrui, et al.
Published: (2026)
by: Liu, Wenrui, et al.
Published: (2026)
Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement
by: Yuan, Zheng, et al.
Published: (2024)
by: Yuan, Zheng, et al.
Published: (2024)
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
by: Nie, Sen, et al.
Published: (2025)
by: Nie, Sen, et al.
Published: (2025)
ACT Now: Preempting LVLM Hallucinations via Adaptive Context Integration
by: Yan, Bei, et al.
Published: (2026)
by: Yan, Bei, et al.
Published: (2026)
Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating
by: Cao, Xiangkui, et al.
Published: (2026)
by: Cao, Xiangkui, et al.
Published: (2026)
Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)
by: Yan, Bei, et al.
Published: (2024)
Dual Attention Guided Defense Against Malicious Edits
by: Zhang, Jie, et al.
Published: (2025)
by: Zhang, Jie, et al.
Published: (2025)
Jodi: Unification of Visual Generation and Understanding via Joint Modeling
by: Xu, Yifeng, et al.
Published: (2025)
by: Xu, Yifeng, et al.
Published: (2025)
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
by: Li, Yiheng, et al.
Published: (2024)
by: Li, Yiheng, et al.
Published: (2024)
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models
by: Li, Xiaomin, et al.
Published: (2024)
by: Li, Xiaomin, et al.
Published: (2024)
un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP
by: Li, Yinqi, et al.
Published: (2025)
by: Li, Yinqi, et al.
Published: (2025)
HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention
by: Tang, Xiaolong, et al.
Published: (2024)
by: Tang, Xiaolong, et al.
Published: (2024)
InstaVSR: Taming Diffusion for Efficient and Temporally Consistent Video Super-Resolution
by: Hu, Jintong, et al.
Published: (2026)
by: Hu, Jintong, et al.
Published: (2026)
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
by: Liu, Ye, et al.
Published: (2024)
by: Liu, Ye, et al.
Published: (2024)
Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
by: Luo, Songtao, et al.
Published: (2023)
by: Luo, Songtao, et al.
Published: (2023)
Similar Items
-
Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models
by: Wang, Zhongqi, et al.
Published: (2025) -
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
by: Wang, Zhongqi, et al.
Published: (2024) -
T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
by: Li, Changzhen, et al.
Published: (2025) -
Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models
by: Zhang, Jie, et al.
Published: (2025) -
CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
by: Xu, Yifeng, et al.
Published: (2024)