:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Alishzade, Nigar, Abdullayeva, Gulchin
Format:	Preprint
Published:	2025
Subjects:	Computation and Language I.2.10
Online Access:	https://arxiv.org/abs/2511.13126
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Sign Language Recognition and Translation for Low-Resource Languages: Challenges and Pathways Forward
by: Alishzade, Nigar, et al.
Published: (2026)

The Influence of Iconicity in Transfer Learning for Sign Language Recognition
by: Artiaga, Keren, et al.
Published: (2026)

AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software
by: Alishzade, Nigar, et al.
Published: (2024)

ADAT: Time-Series-Aware Adaptive Transformer Architecture for Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

GLoT: A Novel Gated-Logarithmic Transformer for Efficient Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)

Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation
by: Gopinathan, Muraleekrishna, et al.
Published: (2024)

PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
by: Dai, Song, et al.
Published: (2025)

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
by: Tong, Jingqi, et al.
Published: (2025)

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment
by: Bian, Zhipeng, et al.
Published: (2026)

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models
by: Balasubramanian, Sriram, et al.
Published: (2025)

Learning the meanings of function words from grounded language using a visual question answering model
by: Portelance, Eva, et al.
Published: (2023)

Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
by: Yang, Shan
Published: (2026)

Relative Drawing Identification Complexity is Invariant to Modality in Vision-Language Models
by: Freitas, Diogo, et al.
Published: (2025)

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
by: Ji, Yikun, et al.
Published: (2025)

From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text
by: Le, Van-Truong
Published: (2026)

What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
by: Yang, Dingyi, et al.
Published: (2024)

More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage
by: He, Wei
Published: (2026)

Defending against Backdoor Attacks via Module Switching
by: Li, Weijun, et al.
Published: (2025)

Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos
by: Li, Xiaoyang, et al.
Published: (2025)

HATL: Hierarchical Adaptive-Transfer Learning Framework for Sign Language Machine Translation
by: Shahin, Nada, et al.
Published: (2026)

CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA
by: Kovalev, Vsevolod, et al.
Published: (2025)

GroundCap: A Visually Grounded Image Captioning Dataset
by: Oliveira, Daniel A. P., et al.
Published: (2025)

Large Language Model for Qualitative Research -- A Systematic Mapping Study
by: Barros, Cauã Ferreira, et al.
Published: (2024)

U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)

STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics
by: Chen, Jiawen, et al.
Published: (2024)

VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
by: Parcalabescu, Letitia, et al.
Published: (2021)

NAAQA: A Neural Architecture for Acoustic Question Answering
by: Abdelnour, Jerome, et al.
Published: (2021)

MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models
by: Ji, Yiyan, et al.
Published: (2025)

Bridge Diffusion Model: Bridge Chinese Text-to-Image Diffusion Model with English Communities
by: Liu, Shanyuan, et al.
Published: (2023)

MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks
by: Parcalabescu, Letitia, et al.
Published: (2022)

StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
by: Oliveira, Daniel A. P., et al.
Published: (2025)

Pointing-Based Object Recognition
by: Hajdúch, Lukáš, et al.
Published: (2026)

Evaluation of Attention Mechanisms in U-Net Architectures for Semantic Segmentation of Brazilian Rock Art Petroglyphs
by: Melo, Leonardi, et al.
Published: (2025)

CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
by: Ramakrishnan, Aashish Anantha, et al.
Published: (2025)

On the Limitations of Vision-Language Models in Understanding Image Transforms
by: Anis, Ahmad Mustafa, et al.
Published: (2025)

Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition
by: Nakamura, Ikuo
Published: (2024)

K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology
by: Kim, Soyeon, et al.
Published: (2026)

Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models
by: Portelance, Eva, et al.
Published: (2024)

On Measuring Faithfulness or Self-consistency of Natural Language Explanations
by: Parcalabescu, Letitia, et al.
Published: (2023)