Saved in:
| Main Authors: | Jiang, Zhuohang, Yuan, Xu, Qu, Haohao, Lin, Shanru, Liu, Kanglong, Fan, Wenqi, Li, Qing |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.22683 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering
by: Jiang, Zhuohang, et al.
Published: (2025)
by: Jiang, Zhuohang, et al.
Published: (2025)
Enhancing Interpretability for Vision Models via Shapley Value Optimization
by: Fan, Kanglong, et al.
Published: (2025)
by: Fan, Kanglong, et al.
Published: (2025)
Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization
by: Li, Mu, et al.
Published: (2023)
by: Li, Mu, et al.
Published: (2023)
Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
by: Wen, Wen, et al.
Published: (2025)
by: Wen, Wen, et al.
Published: (2025)
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
by: Ma, Xiaochen, et al.
Published: (2023)
by: Ma, Xiaochen, et al.
Published: (2023)
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark
by: Li, Haodong, et al.
Published: (2024)
by: Li, Haodong, et al.
Published: (2024)
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
by: Ning, Liangbo, et al.
Published: (2025)
by: Ning, Liangbo, et al.
Published: (2025)
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
by: Huang, Yifei, et al.
Published: (2024)
by: Huang, Yifei, et al.
Published: (2024)
FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models
by: Zheng, Weiying, et al.
Published: (2025)
by: Zheng, Weiying, et al.
Published: (2025)
SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
by: Shi, Xiangyu, et al.
Published: (2025)
by: Shi, Xiangyu, et al.
Published: (2025)
From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models
by: Li, Zongzhao, et al.
Published: (2025)
by: Li, Zongzhao, et al.
Published: (2025)
mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA
by: Yuan, Xu, et al.
Published: (2025)
by: Yuan, Xu, et al.
Published: (2025)
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
by: Wang, Jin, et al.
Published: (2025)
by: Wang, Jin, et al.
Published: (2025)
MoAI: Mixture of All Intelligence for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
RoiMAM: Region-of-Interest Medical Attention Model for Efficient Vision-Language Understanding
by: Yang, Jiayan, et al.
Published: (2026)
by: Yang, Jiayan, et al.
Published: (2026)
Learned Scanpaths Aid Blind Panoramic Video Quality Assessment
by: Fan, Kanglong, et al.
Published: (2024)
by: Fan, Kanglong, et al.
Published: (2024)
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
by: Ruan, Jiacheng, et al.
Published: (2025)
by: Ruan, Jiacheng, et al.
Published: (2025)
VisionClaw: Always-On AI Agents through Smart Glasses
by: Liu, Xiaoan, et al.
Published: (2026)
by: Liu, Xiaoan, et al.
Published: (2026)
An Egocentric Vision-Language Model based Portable Real-time Smart Assistant
by: Huang, Yifei, et al.
Published: (2025)
by: Huang, Yifei, et al.
Published: (2025)
ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models
by: Shen, Qirui, et al.
Published: (2026)
by: Shen, Qirui, et al.
Published: (2026)
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
by: Chen, Xiuyuan, et al.
Published: (2023)
by: Chen, Xiuyuan, et al.
Published: (2023)
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
by: Ying, Kaining, et al.
Published: (2024)
by: Ying, Kaining, et al.
Published: (2024)
Benchmarking and Mitigating Sycophancy in Medical Vision Language Models
by: Xu, Juangui, et al.
Published: (2025)
by: Xu, Juangui, et al.
Published: (2025)
A Benchmark and Evaluation for Real-World Out-of-Distribution Detection Using Vision-Language Models
by: Noda, Shiho, et al.
Published: (2025)
by: Noda, Shiho, et al.
Published: (2025)
Imaging for All-Day Wearable Smart Glasses
by: Goesele, Michael, et al.
Published: (2025)
by: Goesele, Michael, et al.
Published: (2025)
SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models
by: Zhao, Xinyi, et al.
Published: (2025)
by: Zhao, Xinyi, et al.
Published: (2025)
Attention Debiasing for Token Pruning in Vision Language Models
by: Zhao, Kai, et al.
Published: (2025)
by: Zhao, Kai, et al.
Published: (2025)
SurgXBench: Explainable Vision-Language Model Benchmark for Surgery
by: Cheng, Jiajun, et al.
Published: (2025)
by: Cheng, Jiajun, et al.
Published: (2025)
The Illusion of Clinical Reasoning: A Benchmark Reveals the Pervasive Gap in Vision-Language Models for Clinical Competency
by: Wang, Dingyu, et al.
Published: (2025)
by: Wang, Dingyu, et al.
Published: (2025)
LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild
by: Wang, Zhiqiang, et al.
Published: (2024)
by: Wang, Zhiqiang, et al.
Published: (2024)
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology
by: Jiang, Xin, et al.
Published: (2024)
by: Jiang, Xin, et al.
Published: (2024)
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions
by: He, Xingwei, et al.
Published: (2024)
by: He, Xingwei, et al.
Published: (2024)
Embodied3DBench: Benchmarking Low-Level Embodied Spatial Intelligence of Vision Language Models
by: Zhang, Jiyao, et al.
Published: (2026)
by: Zhang, Jiyao, et al.
Published: (2026)
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
by: Li, Juncheng, et al.
Published: (2025)
by: Li, Juncheng, et al.
Published: (2025)
Towards Statistical Factuality Guarantee for Large Vision-Language Models
by: Li, Zhuohang, et al.
Published: (2025)
by: Li, Zhuohang, et al.
Published: (2025)
From Diagnosis to Improvement: Probing Spatio-Physical Reasoning in Vision Language Models
by: Han, Tiancheng, et al.
Published: (2025)
by: Han, Tiancheng, et al.
Published: (2025)
Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
by: Ling, Yiran, et al.
Published: (2026)
by: Ling, Yiran, et al.
Published: (2026)
FinChart-Bench: Benchmarking Financial Chart Comprehension in Vision-Language Models
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
EPIC-Bench: A Perception-Centric Benchmark for Fine-Grained Embodied Visual Grounding in Vision-Language Models
by: Shan, Haozhe, et al.
Published: (2026)
by: Shan, Haozhe, et al.
Published: (2026)
Dynamic Rank Adaptation for Vision-Language Models
by: Wang, Jiahui, et al.
Published: (2025)
by: Wang, Jiahui, et al.
Published: (2025)
Similar Items
-
QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering
by: Jiang, Zhuohang, et al.
Published: (2025) -
Enhancing Interpretability for Vision Models via Shapley Value Optimization
by: Fan, Kanglong, et al.
Published: (2025) -
Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization
by: Li, Mu, et al.
Published: (2023) -
Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
by: Wen, Wen, et al.
Published: (2025) -
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
by: Ma, Xiaochen, et al.
Published: (2023)