:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiu, Yanming, Jiang, Zhengyuan, Gong, Neil Zhenqiang, Gorlatova, Maria
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.05510
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
by: Xiu, Yanming, et al.
Published: (2025)

Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
by: Xiu, Yanming, et al.
Published: (2025)

Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble
by: Duan, Lin, et al.
Published: (2025)

User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments
by: Lin, Junfeng, et al.
Published: (2026)

Robustness of Vision Foundation Models to Common Perturbations
by: Liu, Hongbin, et al.
Published: (2026)

A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality
by: Chen, Rongqian, et al.
Published: (2025)

SafeText: Safe Text-to-image Models via Aligning the Text Encoder
by: Hu, Yuepeng, et al.
Published: (2025)

Toward Safe, Trustworthy and Realistic Augmented Reality User Experience
by: Xiu, Yanming
Published: (2025)

Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
by: Jiang, Zhengyuan, et al.
Published: (2025)

Watermark-based Attribution of AI-Generated Content
by: Jiang, Zhengyuan, et al.
Published: (2024)

Demonstrating Visual Information Manipulation Attacks in Augmented Reality: A Hands-On Miniature City-Based Setup
by: Xiu, Yanming, et al.
Published: (2025)

EditTrack: Detecting and Attributing AI-assisted Image Editing
by: Jiang, Zhengyuan, et al.
Published: (2025)

Certifiably Robust Image Watermark
by: Jiang, Zhengyuan, et al.
Published: (2024)

VideoMarkBench: Benchmarking Robustness of Video Watermarking
by: Jiang, Zhengyuan, et al.
Published: (2025)

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models
by: Hu, Yuepeng, et al.
Published: (2024)

Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning
by: Jia, Yuqi, et al.
Published: (2024)

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning
by: Zhang, Jinghuai, et al.
Published: (2022)

Visual Hallucinations of Multi-modal Large Language Models
by: Huang, Wen, et al.
Published: (2024)

Say It, See It: A Systematic Evaluation on Speech-Based 3D Content Generation Methods in Augmented Reality
by: Xiu, Yanming, et al.
Published: (2025)

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
by: Li, Juncheng, et al.
Published: (2025)

Refusing Safe Prompts for Multi-modal Large Language Models
by: Shao, Zedian, et al.
Published: (2024)

Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
by: Liu, Hongbin, et al.
Published: (2024)

Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models
by: Liu, Zhongye, et al.
Published: (2024)

WebInject: Prompt Injection Attack to Web Agents
by: Wang, Xilong, et al.
Published: (2025)

CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models
by: Xiu, Kedong, et al.
Published: (2025)

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection
by: Shao, Zedian, et al.
Published: (2026)

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
by: Yu, Hong-Tao, et al.
Published: (2025)

HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models
by: Lee, Junhee, et al.
Published: (2026)

AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models
by: Li, Jiayu, et al.
Published: (2025)

BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models
by: Tan, Bryan Chen Zhengyu, et al.
Published: (2025)

One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models
by: Zhao, Jiale, et al.
Published: (2025)

Beyond Augmentation: Empowering Model Robustness under Extreme Capture Environments
by: Gong, Yunpeng, et al.
Published: (2024)

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models
by: Li, Zhaoyang, et al.
Published: (2025)

Practical Region-level Attack against Segment Anything Models
by: Shen, Yifan, et al.
Published: (2024)

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
by: Hua, Hang, et al.
Published: (2024)

GO-NeRF: Generating Objects in Neural Radiance Fields for Virtual Reality Content Creation
by: Dai, Peng, et al.
Published: (2024)

Deep Learning for Virtual Reality User Identification: A Benchmark
by: Frizzo, Davide, et al.
Published: (2026)

Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation
by: Zhang, Jihai, et al.
Published: (2025)

Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models
by: Waseda, Futa, et al.
Published: (2025)

When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
by: Liang, Tuo, et al.
Published: (2025)