Saved in:
| Main Authors: | Ranjan, Rahul, Gurve, Mahendra Kumar, Anuj, Nitin, Prasad, Yamuna |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.22160 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MaiBERT: A Pre-training Corpus and Language Model for Low-Resourced Maithili Language
by: Yadav, Sumit, et al.
Published: (2025)
by: Yadav, Sumit, et al.
Published: (2025)
ONG: One-Shot NMF-based Gradient Masking for Efficient Model Sparsification
by: Behera, Sankar, et al.
Published: (2025)
by: Behera, Sankar, et al.
Published: (2025)
Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv10
by: Sapkota, Ranjan, et al.
Published: (2025)
by: Sapkota, Ranjan, et al.
Published: (2025)
SentiFormer: Metadata Enhanced Transformer for Image Sentiment Analysis
by: Feng, Bin, et al.
Published: (2025)
by: Feng, Bin, et al.
Published: (2025)
Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages
by: Salmè, Marco, et al.
Published: (2025)
by: Salmè, Marco, et al.
Published: (2025)
ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models
by: Darabi, Nastaran, et al.
Published: (2026)
by: Darabi, Nastaran, et al.
Published: (2026)
VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages
by: Atuhurra, Jesse, et al.
Published: (2025)
by: Atuhurra, Jesse, et al.
Published: (2025)
On the Cultural Anachronism and Temporal Reasoning in Vision Language Models
by: Ranjan, Mukul, et al.
Published: (2026)
by: Ranjan, Mukul, et al.
Published: (2026)
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
by: Fang, Rongyao, et al.
Published: (2025)
by: Fang, Rongyao, et al.
Published: (2025)
Cooperative Sentiment Agents for Multimodal Sentiment Analysis
by: Wang, Shanmin, et al.
Published: (2024)
by: Wang, Shanmin, et al.
Published: (2024)
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models
by: Stogiannidis, Ilias, et al.
Published: (2025)
by: Stogiannidis, Ilias, et al.
Published: (2025)
Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects
by: Rubaiyeat, Husne Ara, et al.
Published: (2025)
by: Rubaiyeat, Husne Ara, et al.
Published: (2025)
Enhancing Sentiment Analysis through Multimodal Fusion: A BERT-DINOv2 Approach
by: Zhao, Taoxu, et al.
Published: (2025)
by: Zhao, Taoxu, et al.
Published: (2025)
KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR
by: Gagnier, Henry, et al.
Published: (2026)
by: Gagnier, Henry, et al.
Published: (2026)
LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task
by: Asgarov, Ali, et al.
Published: (2024)
by: Asgarov, Ali, et al.
Published: (2024)
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation
by: Lim, Hyeonseok, et al.
Published: (2024)
by: Lim, Hyeonseok, et al.
Published: (2024)
RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages
by: Kashid, Harshvivek, et al.
Published: (2024)
by: Kashid, Harshvivek, et al.
Published: (2024)
ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments
by: Ray, Sourjyadip, et al.
Published: (2024)
by: Ray, Sourjyadip, et al.
Published: (2024)
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling
by: Li, Siqi, et al.
Published: (2025)
by: Li, Siqi, et al.
Published: (2025)
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
by: Chen, Kaijie, et al.
Published: (2025)
by: Chen, Kaijie, et al.
Published: (2025)
HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning
by: Wei, Yanbin, et al.
Published: (2026)
by: Wei, Yanbin, et al.
Published: (2026)
Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning
by: Zhu, Yingjie, et al.
Published: (2024)
by: Zhu, Yingjie, et al.
Published: (2024)
Open World Scene Graph Generation using Vision Language Models
by: Dutta, Amartya, et al.
Published: (2025)
by: Dutta, Amartya, et al.
Published: (2025)
ESsEN: Training Compact Discriminative Vision-Language Transformers in a Low-Resource Setting
by: Fields, Clayton, et al.
Published: (2026)
by: Fields, Clayton, et al.
Published: (2026)
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning
by: Luo, Yuxuan, et al.
Published: (2025)
by: Luo, Yuxuan, et al.
Published: (2025)
The Typological Characteristics of Maithili
by: Amit Kumar Chandrana
Published: (2017)
by: Amit Kumar Chandrana
Published: (2017)
Enhanced Sentiment Analysis of Iranian Restaurant Reviews Utilizing Sentiment Intensity Analyzer & Fuzzy Logic
by: Rokhva, Shayan, et al.
Published: (2025)
by: Rokhva, Shayan, et al.
Published: (2025)
READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling
by: Nguyen, Thong, et al.
Published: (2023)
by: Nguyen, Thong, et al.
Published: (2023)
CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding
by: Ung, Huy Quang, et al.
Published: (2025)
by: Ung, Huy Quang, et al.
Published: (2025)
RiskCueBench: Benchmarking Anticipatory Reasoning from Early Risk Cues in Video-Language Models
by: Luo, Sha, et al.
Published: (2026)
by: Luo, Sha, et al.
Published: (2026)
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
by: De, Anik, et al.
Published: (2025)
by: De, Anik, et al.
Published: (2025)
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
by: Li, Ang, et al.
Published: (2025)
by: Li, Ang, et al.
Published: (2025)
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
by: Maeda, Koki, et al.
Published: (2024)
by: Maeda, Koki, et al.
Published: (2024)
ReMI: A Dataset for Reasoning with Multiple Images
by: Kazemi, Mehran, et al.
Published: (2024)
by: Kazemi, Mehran, et al.
Published: (2024)
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
by: Wu, Xiyang, et al.
Published: (2024)
by: Wu, Xiyang, et al.
Published: (2024)
Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
by: Jia, Qi, et al.
Published: (2024)
by: Jia, Qi, et al.
Published: (2024)
DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning
by: Yilmaz, Abdurrahim, et al.
Published: (2026)
by: Yilmaz, Abdurrahim, et al.
Published: (2026)
Object Detection with Multimodal Large Vision-Language Models: An In-depth Review
by: Sapkota, Ranjan, et al.
Published: (2025)
by: Sapkota, Ranjan, et al.
Published: (2025)
How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning
by: Yang, Luyu, et al.
Published: (2026)
by: Yang, Luyu, et al.
Published: (2026)
Similar Items
-
MaiBERT: A Pre-training Corpus and Language Model for Low-Resourced Maithili Language
by: Yadav, Sumit, et al.
Published: (2025) -
ONG: One-Shot NMF-based Gradient Masking for Efficient Model Sparsification
by: Behera, Sankar, et al.
Published: (2025) -
Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv10
by: Sapkota, Ranjan, et al.
Published: (2025) -
SentiFormer: Metadata Enhanced Transformer for Image Sentiment Analysis
by: Feng, Bin, et al.
Published: (2025) -
Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages
by: Salmè, Marco, et al.
Published: (2025)