:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Sharshar, Ahmed, Khan, Latif U., Ullah, Waseem, Guizani, Mohsen
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language
Online-Zugang:	https://arxiv.org/abs/2502.07855
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

PulmoFusion: Advancing Pulmonary Health with Efficient Multi-Modal Fusion
von: Sharshar, Ahmed, et al.
Veröffentlicht: (2025)

ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation
von: Elgendy, Hosam, et al.
Veröffentlicht: (2025)

UAV-Assisted Real-Time Disaster Detection Using Optimized Transformer Model
von: Jankovic, Branislava, et al.
Veröffentlicht: (2025)

Real-Time Aerial Fire Detection on Resource-Constrained Devices Using Knowledge Distillation
von: Jangirova, Sabina, et al.
Veröffentlicht: (2025)

GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing
von: Elgendy, Hosam, et al.
Veröffentlicht: (2024)

Not Only Grey Matter: OmniBrain for Robust Multimodal Classification of Alzheimer's Disease
von: Sharshar, Ahmed, et al.
Veröffentlicht: (2025)

Survey on Vision-Language-Action Models
von: Adilkhanov, Adilzhan, et al.
Veröffentlicht: (2025)

Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey
von: Xu, Yiwen, et al.
Veröffentlicht: (2026)

Vision Mamba: A Comprehensive Survey and Taxonomy
von: Liu, Xiao, et al.
Veröffentlicht: (2024)

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
von: Zhang, Yongting, et al.
Veröffentlicht: (2024)

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
von: Jia, Mengdi, et al.
Veröffentlicht: (2025)

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
von: Zhou, Chenyu, et al.
Veröffentlicht: (2024)

Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction
von: Filvantorkaman, Melika, et al.
Veröffentlicht: (2026)

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
von: Ghosh, Akash, et al.
Veröffentlicht: (2024)

Mamba in Vision: A Comprehensive Survey of Techniques and Applications
von: Rahman, Md Maklachur, et al.
Veröffentlicht: (2024)

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
von: Fu, Chaoyou, et al.
Veröffentlicht: (2024)

Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing
von: Ashqar, Huthaifa I., et al.
Veröffentlicht: (2024)

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
von: Nandy, Abhilash, et al.
Veröffentlicht: (2024)

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
von: Yang, Rui, et al.
Veröffentlicht: (2025)

SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities
von: Ashraf, Yasser, et al.
Veröffentlicht: (2025)

Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models
von: Deniz, Omer Faruk, et al.
Veröffentlicht: (2026)

Small Vision-Language Models: A Survey on Compact Architectures and Techniques
von: Patnaik, Nitesh, et al.
Veröffentlicht: (2025)

The Role of Language Models in Modern Healthcare: A Comprehensive Review
von: Khalid, Amna, et al.
Veröffentlicht: (2024)

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
von: Hu, Wenbo, et al.
Veröffentlicht: (2024)

A Survey on Agentic Multimodal Large Language Models
von: Yao, Huanjin, et al.
Veröffentlicht: (2025)

A Survey on Benchmarks of Multimodal Large Language Models
von: Li, Jian, et al.
Veröffentlicht: (2024)

A Survey on Evaluation of Multimodal Large Language Models
von: Huang, Jiaxing, et al.
Veröffentlicht: (2024)

Revisiting the Role of Language Priors in Vision-Language Models
von: Lin, Zhiqiu, et al.
Veröffentlicht: (2023)

Chitrarth: Bridging Vision and Language for a Billion People
von: Khan, Shaharukh, et al.
Veröffentlicht: (2025)

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
von: Lu, Yujie, et al.
Veröffentlicht: (2024)

EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training
von: Du, Yiyang, et al.
Veröffentlicht: (2026)

Multi-Object Hallucination in Vision-Language Models
von: Chen, Xuweiyi, et al.
Veröffentlicht: (2024)

Benchmarking Vision Language Models for Cultural Understanding
von: Nayak, Shravan, et al.
Veröffentlicht: (2024)

VLind-Bench: Measuring Language Priors in Large Vision-Language Models
von: Lee, Kang-il, et al.
Veröffentlicht: (2024)

VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark
von: Huang, Han, et al.
Veröffentlicht: (2024)

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
von: Zhao, Hongbo, et al.
Veröffentlicht: (2025)

A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
von: Buettner, Kyle, et al.
Veröffentlicht: (2025)

Can Vision Language Models Understand Mimed Actions?
von: Cho, Hyundong, et al.
Veröffentlicht: (2025)

Are Large Vision Language Models Good Game Players?
von: Wang, Xinyu, et al.
Veröffentlicht: (2025)

Probing and Inducing Combinational Creativity in Vision-Language Models
von: Peng, Yongqian, et al.
Veröffentlicht: (2025)