Saved in:
| Main Authors: | Xiu, Yanming, Scargill, Tim, Gorlatova, Maria |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.12553 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Vision-Language Models under Contradictory Virtual Content Attacks in Augmented Reality
by: Xiu, Yanming, et al.
Published: (2026)
by: Xiu, Yanming, et al.
Published: (2026)
Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
by: Xiu, Yanming, et al.
Published: (2025)
by: Xiu, Yanming, et al.
Published: (2025)
Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble
by: Duan, Lin, et al.
Published: (2025)
by: Duan, Lin, et al.
Published: (2025)
User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments
by: Lin, Junfeng, et al.
Published: (2026)
by: Lin, Junfeng, et al.
Published: (2026)
Toward Safe, Trustworthy and Realistic Augmented Reality User Experience
by: Xiu, Yanming
Published: (2025)
by: Xiu, Yanming
Published: (2025)
A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality
by: Chen, Rongqian, et al.
Published: (2025)
by: Chen, Rongqian, et al.
Published: (2025)
Demonstrating Visual Information Manipulation Attacks in Augmented Reality: A Hands-On Miniature City-Based Setup
by: Xiu, Yanming, et al.
Published: (2025)
by: Xiu, Yanming, et al.
Published: (2025)
Say It, See It: A Systematic Evaluation on Speech-Based 3D Content Generation Methods in Augmented Reality
by: Xiu, Yanming, et al.
Published: (2025)
by: Xiu, Yanming, et al.
Published: (2025)
Retrievals Can Be Detrimental: Unveiling the Backdoor Vulnerability of Retrieval-Augmented Diffusion Models
by: Fang, Hao, et al.
Published: (2025)
by: Fang, Hao, et al.
Published: (2025)
Understanding the Detrimental Class-level Effects of Data Augmentation
by: Kirichenko, Polina, et al.
Published: (2023)
by: Kirichenko, Polina, et al.
Published: (2023)
ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection
by: Ma, Yunsheng, et al.
Published: (2022)
by: Ma, Yunsheng, et al.
Published: (2022)
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
by: Chen, Jieneng, et al.
Published: (2024)
by: Chen, Jieneng, et al.
Published: (2024)
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
by: Yu, Hong-Tao, et al.
Published: (2025)
by: Yu, Hong-Tao, et al.
Published: (2025)
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)
by: Kang, Jialiang, et al.
Published: (2025)
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
by: Dey, Sainath, et al.
Published: (2025)
by: Dey, Sainath, et al.
Published: (2025)
ViTmiX: Vision Transformer Explainability Augmented by Mixed Visualization Methods
by: Hogea, Eduard, et al.
Published: (2024)
by: Hogea, Eduard, et al.
Published: (2024)
TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection
by: Chen, Hanning, et al.
Published: (2024)
by: Chen, Hanning, et al.
Published: (2024)
VersaViT: Enhancing MLLM Vision Backbones via Task-Guided Optimization
by: Liu, Yikun, et al.
Published: (2026)
by: Liu, Yikun, et al.
Published: (2026)
ViLBench: A Suite for Vision-Language Process Reward Modeling
by: Tu, Haoqin, et al.
Published: (2025)
by: Tu, Haoqin, et al.
Published: (2025)
ChangeViT: Unleashing Plain Vision Transformers for Change Detection
by: Zhu, Duowang, et al.
Published: (2024)
by: Zhu, Duowang, et al.
Published: (2024)
ViLU: Learning Vision-Language Uncertainties for Failure Prediction
by: Lafon, Marc, et al.
Published: (2025)
by: Lafon, Marc, et al.
Published: (2025)
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
by: Cao, Hanwen, et al.
Published: (2025)
by: Cao, Hanwen, et al.
Published: (2025)
ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models
by: Liu, Yuqi, et al.
Published: (2025)
by: Liu, Yuqi, et al.
Published: (2025)
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
by: Yue, Tongtian, et al.
Published: (2025)
by: Yue, Tongtian, et al.
Published: (2025)
ViLReF: An Expert Knowledge Enabled Vision-Language Retinal Foundation Model
by: Yang, Shengzhu, et al.
Published: (2024)
by: Yang, Shengzhu, et al.
Published: (2024)
Vision-Language Models for Vision Tasks: A Survey
by: Zhang, Jingyi, et al.
Published: (2023)
by: Zhang, Jingyi, et al.
Published: (2023)
CanViT: Toward Active-Vision Foundation Models
by: Berreby, Yohaï-Eliel, et al.
Published: (2026)
by: Berreby, Yohaï-Eliel, et al.
Published: (2026)
A Hybrid CNN-ViT-GNN Framework with GAN-Based Augmentation for Intelligent Weed Detection in Precision Agriculture
by: V, Pandiyaraju, et al.
Published: (2025)
by: V, Pandiyaraju, et al.
Published: (2025)
HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models
by: Lee, Junhee, et al.
Published: (2026)
by: Lee, Junhee, et al.
Published: (2026)
Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2
by: Islam, Md. Rakibul, et al.
Published: (2025)
by: Islam, Md. Rakibul, et al.
Published: (2025)
Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
by: Ming, Yifei, et al.
Published: (2024)
by: Ming, Yifei, et al.
Published: (2024)
ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance
by: Li, Zhuohao, et al.
Published: (2026)
by: Li, Zhuohao, et al.
Published: (2026)
NexViTAD: Few-shot Unsupervised Cross-Domain Defect Detection via Vision Foundation Models and Multi-Task Learning
by: Mu, Tianwei, et al.
Published: (2025)
by: Mu, Tianwei, et al.
Published: (2025)
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
by: Van, Minh-Hao, et al.
Published: (2025)
by: Van, Minh-Hao, et al.
Published: (2025)
Language-Unlocked ViT (LUViT): Empowering Self-Supervised Vision Transformers with LLMs
by: Kuzucu, Selim, et al.
Published: (2025)
by: Kuzucu, Selim, et al.
Published: (2025)
ViThinker: Active Vision-Language Reasoning via Dynamic Perceptual Querying
by: You, Weihang, et al.
Published: (2026)
by: You, Weihang, et al.
Published: (2026)
ViLAaD: Enhancing "Attracting and Dispersing'' Source-Free Domain Adaptation with Vision-and-Language Model
by: Tarashima, Shuhei, et al.
Published: (2025)
by: Tarashima, Shuhei, et al.
Published: (2025)
Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
by: Song, Jinsol, et al.
Published: (2025)
by: Song, Jinsol, et al.
Published: (2025)
MangoLeafViT: Leveraging Lightweight Vision Transformer with Runtime Augmentation for Efficient Mango Leaf Disease Classification
by: Chowdhury, Rafi Hassan, et al.
Published: (2025)
by: Chowdhury, Rafi Hassan, et al.
Published: (2025)
GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer
by: Deressa, Deressa Wodajo, et al.
Published: (2023)
by: Deressa, Deressa Wodajo, et al.
Published: (2023)
Similar Items
-
Benchmarking Vision-Language Models under Contradictory Virtual Content Attacks in Augmented Reality
by: Xiu, Yanming, et al.
Published: (2026) -
Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
by: Xiu, Yanming, et al.
Published: (2025) -
Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble
by: Duan, Lin, et al.
Published: (2025) -
User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments
by: Lin, Junfeng, et al.
Published: (2026) -
Toward Safe, Trustworthy and Realistic Augmented Reality User Experience
by: Xiu, Yanming
Published: (2025)