Saved in:
| Main Authors: | Xiu, Yanming, Jiang, Zhengyuan, Gong, Neil Zhenqiang, Gorlatova, Maria |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.05510 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
by: Xiu, Yanming, et al.
Published: (2025)
by: Xiu, Yanming, et al.
Published: (2025)
Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
by: Xiu, Yanming, et al.
Published: (2025)
by: Xiu, Yanming, et al.
Published: (2025)
Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble
by: Duan, Lin, et al.
Published: (2025)
by: Duan, Lin, et al.
Published: (2025)
User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments
by: Lin, Junfeng, et al.
Published: (2026)
by: Lin, Junfeng, et al.
Published: (2026)
Robustness of Vision Foundation Models to Common Perturbations
by: Liu, Hongbin, et al.
Published: (2026)
by: Liu, Hongbin, et al.
Published: (2026)
A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality
by: Chen, Rongqian, et al.
Published: (2025)
by: Chen, Rongqian, et al.
Published: (2025)
SafeText: Safe Text-to-image Models via Aligning the Text Encoder
by: Hu, Yuepeng, et al.
Published: (2025)
by: Hu, Yuepeng, et al.
Published: (2025)
Toward Safe, Trustworthy and Realistic Augmented Reality User Experience
by: Xiu, Yanming
Published: (2025)
by: Xiu, Yanming
Published: (2025)
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
by: Jiang, Zhengyuan, et al.
Published: (2025)
by: Jiang, Zhengyuan, et al.
Published: (2025)
Watermark-based Attribution of AI-Generated Content
by: Jiang, Zhengyuan, et al.
Published: (2024)
by: Jiang, Zhengyuan, et al.
Published: (2024)
Demonstrating Visual Information Manipulation Attacks in Augmented Reality: A Hands-On Miniature City-Based Setup
by: Xiu, Yanming, et al.
Published: (2025)
by: Xiu, Yanming, et al.
Published: (2025)
EditTrack: Detecting and Attributing AI-assisted Image Editing
by: Jiang, Zhengyuan, et al.
Published: (2025)
by: Jiang, Zhengyuan, et al.
Published: (2025)
Certifiably Robust Image Watermark
by: Jiang, Zhengyuan, et al.
Published: (2024)
by: Jiang, Zhengyuan, et al.
Published: (2024)
VideoMarkBench: Benchmarking Robustness of Video Watermarking
by: Jiang, Zhengyuan, et al.
Published: (2025)
by: Jiang, Zhengyuan, et al.
Published: (2025)
Stable Signature is Unstable: Removing Image Watermark from Diffusion Models
by: Hu, Yuepeng, et al.
Published: (2024)
by: Hu, Yuepeng, et al.
Published: (2024)
Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning
by: Jia, Yuqi, et al.
Published: (2024)
by: Jia, Yuqi, et al.
Published: (2024)
CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning
by: Zhang, Jinghuai, et al.
Published: (2022)
by: Zhang, Jinghuai, et al.
Published: (2022)
Visual Hallucinations of Multi-modal Large Language Models
by: Huang, Wen, et al.
Published: (2024)
by: Huang, Wen, et al.
Published: (2024)
Say It, See It: A Systematic Evaluation on Speech-Based 3D Content Generation Methods in Augmented Reality
by: Xiu, Yanming, et al.
Published: (2025)
by: Xiu, Yanming, et al.
Published: (2025)
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
by: Li, Juncheng, et al.
Published: (2025)
by: Li, Juncheng, et al.
Published: (2025)
Refusing Safe Prompts for Multi-modal Large Language Models
by: Shao, Zedian, et al.
Published: (2024)
by: Shao, Zedian, et al.
Published: (2024)
Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
by: Liu, Hongbin, et al.
Published: (2024)
by: Liu, Hongbin, et al.
Published: (2024)
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models
by: Liu, Zhongye, et al.
Published: (2024)
by: Liu, Zhongye, et al.
Published: (2024)
WebInject: Prompt Injection Attack to Web Agents
by: Wang, Xilong, et al.
Published: (2025)
by: Wang, Xilong, et al.
Published: (2025)
CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models
by: Xiu, Kedong, et al.
Published: (2025)
by: Xiu, Kedong, et al.
Published: (2025)
Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection
by: Shao, Zedian, et al.
Published: (2026)
by: Shao, Zedian, et al.
Published: (2026)
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
by: Yu, Hong-Tao, et al.
Published: (2025)
by: Yu, Hong-Tao, et al.
Published: (2025)
HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models
by: Lee, Junhee, et al.
Published: (2026)
by: Lee, Junhee, et al.
Published: (2026)
AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models
by: Li, Jiayu, et al.
Published: (2025)
by: Li, Jiayu, et al.
Published: (2025)
BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models
by: Tan, Bryan Chen Zhengyu, et al.
Published: (2025)
by: Tan, Bryan Chen Zhengyu, et al.
Published: (2025)
One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models
by: Zhao, Jiale, et al.
Published: (2025)
by: Zhao, Jiale, et al.
Published: (2025)
Beyond Augmentation: Empowering Model Robustness under Extreme Capture Environments
by: Gong, Yunpeng, et al.
Published: (2024)
by: Gong, Yunpeng, et al.
Published: (2024)
ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models
by: Li, Zhaoyang, et al.
Published: (2025)
by: Li, Zhaoyang, et al.
Published: (2025)
Practical Region-level Attack against Segment Anything Models
by: Shen, Yifan, et al.
Published: (2024)
by: Shen, Yifan, et al.
Published: (2024)
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
by: Hua, Hang, et al.
Published: (2024)
by: Hua, Hang, et al.
Published: (2024)
GO-NeRF: Generating Objects in Neural Radiance Fields for Virtual Reality Content Creation
by: Dai, Peng, et al.
Published: (2024)
by: Dai, Peng, et al.
Published: (2024)
Deep Learning for Virtual Reality User Identification: A Benchmark
by: Frizzo, Davide, et al.
Published: (2026)
by: Frizzo, Davide, et al.
Published: (2026)
Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
by: Zhang, Jihai, et al.
Published: (2025)
by: Zhang, Jihai, et al.
Published: (2025)
Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models
by: Waseda, Futa, et al.
Published: (2025)
by: Waseda, Futa, et al.
Published: (2025)
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
by: Liang, Tuo, et al.
Published: (2025)
by: Liang, Tuo, et al.
Published: (2025)
Similar Items
-
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
by: Xiu, Yanming, et al.
Published: (2025) -
Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
by: Xiu, Yanming, et al.
Published: (2025) -
Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble
by: Duan, Lin, et al.
Published: (2025) -
User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments
by: Lin, Junfeng, et al.
Published: (2026) -
Robustness of Vision Foundation Models to Common Perturbations
by: Liu, Hongbin, et al.
Published: (2026)