Saved in:
| Main Authors: | Alishzade, Nigar, Abdullayeva, Gulchin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.13126 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sign Language Recognition and Translation for Low-Resource Languages: Challenges and Pathways Forward
by: Alishzade, Nigar, et al.
Published: (2026)
by: Alishzade, Nigar, et al.
Published: (2026)
The Influence of Iconicity in Transfer Learning for Sign Language Recognition
by: Artiaga, Keren, et al.
Published: (2026)
by: Artiaga, Keren, et al.
Published: (2026)
AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software
by: Alishzade, Nigar, et al.
Published: (2024)
by: Alishzade, Nigar, et al.
Published: (2024)
ADAT: Time-Series-Aware Adaptive Transformer Architecture for Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)
by: Shahin, Nada, et al.
Published: (2025)
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
GLoT: A Novel Gated-Logarithmic Transformer for Efficient Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)
by: Shahin, Nada, et al.
Published: (2025)
Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation
by: Gopinathan, Muraleekrishna, et al.
Published: (2024)
by: Gopinathan, Muraleekrishna, et al.
Published: (2024)
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
by: Dai, Song, et al.
Published: (2025)
by: Dai, Song, et al.
Published: (2025)
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
by: Tong, Jingqi, et al.
Published: (2025)
by: Tong, Jingqi, et al.
Published: (2025)
ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment
by: Bian, Zhipeng, et al.
Published: (2026)
by: Bian, Zhipeng, et al.
Published: (2026)
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models
by: Balasubramanian, Sriram, et al.
Published: (2025)
by: Balasubramanian, Sriram, et al.
Published: (2025)
Learning the meanings of function words from grounded language using a visual question answering model
by: Portelance, Eva, et al.
Published: (2023)
by: Portelance, Eva, et al.
Published: (2023)
Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
by: Yang, Shan
Published: (2026)
by: Yang, Shan
Published: (2026)
Relative Drawing Identification Complexity is Invariant to Modality in Vision-Language Models
by: Freitas, Diogo, et al.
Published: (2025)
by: Freitas, Diogo, et al.
Published: (2025)
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
by: Ji, Yikun, et al.
Published: (2025)
by: Ji, Yikun, et al.
Published: (2025)
From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text
by: Le, Van-Truong
Published: (2026)
by: Le, Van-Truong
Published: (2026)
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
by: Yang, Dingyi, et al.
Published: (2024)
by: Yang, Dingyi, et al.
Published: (2024)
More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage
by: He, Wei
Published: (2026)
by: He, Wei
Published: (2026)
Defending against Backdoor Attacks via Module Switching
by: Li, Weijun, et al.
Published: (2025)
by: Li, Weijun, et al.
Published: (2025)
Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos
by: Li, Xiaoyang, et al.
Published: (2025)
by: Li, Xiaoyang, et al.
Published: (2025)
HATL: Hierarchical Adaptive-Transfer Learning Framework for Sign Language Machine Translation
by: Shahin, Nada, et al.
Published: (2026)
by: Shahin, Nada, et al.
Published: (2026)
CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA
by: Kovalev, Vsevolod, et al.
Published: (2025)
by: Kovalev, Vsevolod, et al.
Published: (2025)
GroundCap: A Visually Grounded Image Captioning Dataset
by: Oliveira, Daniel A. P., et al.
Published: (2025)
by: Oliveira, Daniel A. P., et al.
Published: (2025)
Large Language Model for Qualitative Research -- A Systematic Mapping Study
by: Barros, Cauã Ferreira, et al.
Published: (2024)
by: Barros, Cauã Ferreira, et al.
Published: (2024)
U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)
by: Li, Huibin, et al.
Published: (2025)
STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics
by: Chen, Jiawen, et al.
Published: (2024)
by: Chen, Jiawen, et al.
Published: (2024)
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
by: Parcalabescu, Letitia, et al.
Published: (2021)
by: Parcalabescu, Letitia, et al.
Published: (2021)
NAAQA: A Neural Architecture for Acoustic Question Answering
by: Abdelnour, Jerome, et al.
Published: (2021)
by: Abdelnour, Jerome, et al.
Published: (2021)
MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models
by: Ji, Yiyan, et al.
Published: (2025)
by: Ji, Yiyan, et al.
Published: (2025)
Bridge Diffusion Model: Bridge Chinese Text-to-Image Diffusion Model with English Communities
by: Liu, Shanyuan, et al.
Published: (2023)
by: Liu, Shanyuan, et al.
Published: (2023)
MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks
by: Parcalabescu, Letitia, et al.
Published: (2022)
by: Parcalabescu, Letitia, et al.
Published: (2022)
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
by: Oliveira, Daniel A. P., et al.
Published: (2025)
by: Oliveira, Daniel A. P., et al.
Published: (2025)
Pointing-Based Object Recognition
by: Hajdúch, Lukáš, et al.
Published: (2026)
by: Hajdúch, Lukáš, et al.
Published: (2026)
Evaluation of Attention Mechanisms in U-Net Architectures for Semantic Segmentation of Brazilian Rock Art Petroglyphs
by: Melo, Leonardi, et al.
Published: (2025)
by: Melo, Leonardi, et al.
Published: (2025)
CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
by: Ramakrishnan, Aashish Anantha, et al.
Published: (2025)
by: Ramakrishnan, Aashish Anantha, et al.
Published: (2025)
On the Limitations of Vision-Language Models in Understanding Image Transforms
by: Anis, Ahmad Mustafa, et al.
Published: (2025)
by: Anis, Ahmad Mustafa, et al.
Published: (2025)
Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition
by: Nakamura, Ikuo
Published: (2024)
by: Nakamura, Ikuo
Published: (2024)
K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology
by: Kim, Soyeon, et al.
Published: (2026)
by: Kim, Soyeon, et al.
Published: (2026)
Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models
by: Portelance, Eva, et al.
Published: (2024)
by: Portelance, Eva, et al.
Published: (2024)
On Measuring Faithfulness or Self-consistency of Natural Language Explanations
by: Parcalabescu, Letitia, et al.
Published: (2023)
by: Parcalabescu, Letitia, et al.
Published: (2023)
Similar Items
-
Sign Language Recognition and Translation for Low-Resource Languages: Challenges and Pathways Forward
by: Alishzade, Nigar, et al.
Published: (2026) -
The Influence of Iconicity in Transfer Learning for Sign Language Recognition
by: Artiaga, Keren, et al.
Published: (2026) -
AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software
by: Alishzade, Nigar, et al.
Published: (2024) -
ADAT: Time-Series-Aware Adaptive Transformer Architecture for Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025) -
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)