:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiu, Yanming, Scargill, Tim, Gorlatova, Maria
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2501.12553
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Benchmarking Vision-Language Models under Contradictory Virtual Content Attacks in Augmented Reality
by: Xiu, Yanming, et al.
Published: (2026)

Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
by: Xiu, Yanming, et al.
Published: (2025)

Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble
by: Duan, Lin, et al.
Published: (2025)

User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments
by: Lin, Junfeng, et al.
Published: (2026)

Toward Safe, Trustworthy and Realistic Augmented Reality User Experience
by: Xiu, Yanming
Published: (2025)

A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality
by: Chen, Rongqian, et al.
Published: (2025)

Demonstrating Visual Information Manipulation Attacks in Augmented Reality: A Hands-On Miniature City-Based Setup
by: Xiu, Yanming, et al.
Published: (2025)

Say It, See It: A Systematic Evaluation on Speech-Based 3D Content Generation Methods in Augmented Reality
by: Xiu, Yanming, et al.
Published: (2025)

Retrievals Can Be Detrimental: Unveiling the Backdoor Vulnerability of Retrieval-Augmented Diffusion Models
by: Fang, Hao, et al.
Published: (2025)

Understanding the Detrimental Class-level Effects of Data Augmentation
by: Kirichenko, Polina, et al.
Published: (2023)

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection
by: Ma, Yunsheng, et al.
Published: (2022)

ViTamin: Designing Scalable Vision Models in the Vision-Language Era
by: Chen, Jieneng, et al.
Published: (2024)

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
by: Yu, Hong-Tao, et al.
Published: (2025)

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)

Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
by: Dey, Sainath, et al.
Published: (2025)

ViTmiX: Vision Transformer Explainability Augmented by Mixed Visualization Methods
by: Hogea, Eduard, et al.
Published: (2024)

TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection
by: Chen, Hanning, et al.
Published: (2024)

VersaViT: Enhancing MLLM Vision Backbones via Task-Guided Optimization
by: Liu, Yikun, et al.
Published: (2026)

ViLBench: A Suite for Vision-Language Process Reward Modeling
by: Tu, Haoqin, et al.
Published: (2025)

ChangeViT: Unleashing Plain Vision Transformers for Change Detection
by: Zhu, Duowang, et al.
Published: (2024)

ViLU: Learning Vision-Language Uncertainties for Failure Prediction
by: Lafon, Marc, et al.
Published: (2025)

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
by: Cao, Hanwen, et al.
Published: (2025)

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models
by: Liu, Yuqi, et al.
Published: (2025)

LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
by: Yue, Tongtian, et al.
Published: (2025)

ViLReF: An Expert Knowledge Enabled Vision-Language Retinal Foundation Model
by: Yang, Shengzhu, et al.
Published: (2024)

Vision-Language Models for Vision Tasks: A Survey
by: Zhang, Jingyi, et al.
Published: (2023)

CanViT: Toward Active-Vision Foundation Models
by: Berreby, Yohaï-Eliel, et al.
Published: (2026)

A Hybrid CNN-ViT-GNN Framework with GAN-Based Augmentation for Intelligent Weed Detection in Precision Agriculture
by: V, Pandiyaraju, et al.
Published: (2025)

HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models
by: Lee, Junhee, et al.
Published: (2026)

Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2
by: Islam, Md. Rakibul, et al.
Published: (2025)

Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
by: Ming, Yifei, et al.
Published: (2024)

ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance
by: Li, Zhuohao, et al.
Published: (2026)

NexViTAD: Few-shot Unsupervised Cross-Domain Defect Detection via Vision Foundation Models and Multi-Task Learning
by: Mu, Tianwei, et al.
Published: (2025)

Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
by: Van, Minh-Hao, et al.
Published: (2025)

Language-Unlocked ViT (LUViT): Empowering Self-Supervised Vision Transformers with LLMs
by: Kuzucu, Selim, et al.
Published: (2025)

ViThinker: Active Vision-Language Reasoning via Dynamic Perceptual Querying
by: You, Weihang, et al.
Published: (2026)

ViLAaD: Enhancing "Attracting and Dispersing'' Source-Free Domain Adaptation with Vision-and-Language Model
by: Tarashima, Shuhei, et al.
Published: (2025)

Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
by: Song, Jinsol, et al.
Published: (2025)

MangoLeafViT: Leveraging Lightweight Vision Transformer with Runtime Augmentation for Efficient Mango Leaf Disease Classification
by: Chowdhury, Rafi Hassan, et al.
Published: (2025)

GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer
by: Deressa, Deressa Wodajo, et al.
Published: (2023)