Saved in:
| Main Author: | Shihata, Yusuf |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.02985 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Limitations of Vision-Language Models in Understanding Image Transforms
by: Anis, Ahmad Mustafa, et al.
Published: (2025)
by: Anis, Ahmad Mustafa, et al.
Published: (2025)
Transfer-learning for video classification: Video Swin Transformer on multiple domains
by: Oliveira, Daniel A. P., et al.
Published: (2022)
by: Oliveira, Daniel A. P., et al.
Published: (2022)
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations
by: Hütten, Nils, et al.
Published: (2025)
by: Hütten, Nils, et al.
Published: (2025)
From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation
by: Shahin, Nada, et al.
Published: (2024)
by: Shahin, Nada, et al.
Published: (2024)
ADAT: Time-Series-Aware Adaptive Transformer Architecture for Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)
by: Shahin, Nada, et al.
Published: (2025)
Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis
by: Heyne, Catyana, et al.
Published: (2026)
by: Heyne, Catyana, et al.
Published: (2026)
GLoT: A Novel Gated-Logarithmic Transformer for Efficient Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)
by: Shahin, Nada, et al.
Published: (2025)
When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't
by: Nemitz, Jonathan, et al.
Published: (2026)
by: Nemitz, Jonathan, et al.
Published: (2026)
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
by: Niu, Yuwei, et al.
Published: (2025)
by: Niu, Yuwei, et al.
Published: (2025)
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
by: Deng, Juncan, et al.
Published: (2024)
by: Deng, Juncan, et al.
Published: (2024)
Multimodal Ensemble with Conditional Feature Fusion for Dysgraphia Diagnosis in Children from Handwriting Samples
by: Kunhoth, Jayakanth, et al.
Published: (2024)
by: Kunhoth, Jayakanth, et al.
Published: (2024)
Using Deep Learning to Generate Semantically Correct Hindi Captions
by: Khan, Wasim Akram, et al.
Published: (2026)
by: Khan, Wasim Akram, et al.
Published: (2026)
LLM-Guided Exemplar Selection for Few-Shot Wearable-Sensor Human Activity Recognition
by: Ronando, Elsen, et al.
Published: (2025)
by: Ronando, Elsen, et al.
Published: (2025)
myMNIST: Benchmark of PETNN, KAN, and Classical Deep Learning Models for Burmese Handwritten Digit Recognition
by: Thu, Ye Kyaw, et al.
Published: (2026)
by: Thu, Ye Kyaw, et al.
Published: (2026)
The Influence of Iconicity in Transfer Learning for Sign Language Recognition
by: Artiaga, Keren, et al.
Published: (2026)
by: Artiaga, Keren, et al.
Published: (2026)
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
by: Agarwal, Amit, et al.
Published: (2025)
by: Agarwal, Amit, et al.
Published: (2025)
Intrinsic Image Fusion for Multi-View 3D Material Reconstruction
by: Kocsis, Peter, et al.
Published: (2025)
by: Kocsis, Peter, et al.
Published: (2025)
Mechanisms of Prompt-Induced Hallucination in Vision-Language Models
by: Rudman, William, et al.
Published: (2026)
by: Rudman, William, et al.
Published: (2026)
PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
by: Patel, Hitesh Laxmichand, et al.
Published: (2025)
by: Patel, Hitesh Laxmichand, et al.
Published: (2025)
Generative AI for Video Translation: A Scalable Architecture for Multilingual Video Conferencing
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)
MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models
by: Ji, Yiyan, et al.
Published: (2025)
by: Ji, Yiyan, et al.
Published: (2025)
InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2025)
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2025)
Enhancing Sports Strategy with Video Analytics and Data Mining: Assessing the effectiveness of Multimodal LLMs in tennis video analysis
by: Teo, Charlton
Published: (2025)
by: Teo, Charlton
Published: (2025)
Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
by: Ji, Binbin, et al.
Published: (2025)
by: Ji, Binbin, et al.
Published: (2025)
Adapting Multimodal Foundation Models for Few-Shot Learning: A Comprehensive Study on Contrastive Captioners
by: Narasinghe, N. K. B. M. P. K. B., et al.
Published: (2025)
by: Narasinghe, N. K. B. M. P. K. B., et al.
Published: (2025)
RPCASSM: Robust PCA State Space Model For Infrared Small Target Detection
by: Liu, Pingping, et al.
Published: (2026)
by: Liu, Pingping, et al.
Published: (2026)
THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion
by: Ioan, Calin Teodor
Published: (2025)
by: Ioan, Calin Teodor
Published: (2025)
A Two-stage Transformer Framework for Temporal Localization of Distracted Driver Behaviors
by: Doan, Gia-Bao, et al.
Published: (2026)
by: Doan, Gia-Bao, et al.
Published: (2026)
Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion
by: Dell'Erba, Samuele, et al.
Published: (2025)
by: Dell'Erba, Samuele, et al.
Published: (2025)
Obtaining Favorable Layouts for Multiple Object Generation
by: Battash, Barak, et al.
Published: (2024)
by: Battash, Barak, et al.
Published: (2024)
Fruit Classification System with Deep Learning and Neural Architecture Search
by: Dewi, Christine, et al.
Published: (2024)
by: Dewi, Christine, et al.
Published: (2024)
SynCo: Synthetic Hard Negatives for Contrastive Visual Representation Learning
by: Giakoumoglou, Nikolaos, et al.
Published: (2024)
by: Giakoumoglou, Nikolaos, et al.
Published: (2024)
Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning
by: Elberg, Rafael, et al.
Published: (2024)
by: Elberg, Rafael, et al.
Published: (2024)
From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space
by: Hamara, Andrew, et al.
Published: (2024)
by: Hamara, Andrew, et al.
Published: (2024)
Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis
by: Umeike, Robinson, et al.
Published: (2025)
by: Umeike, Robinson, et al.
Published: (2025)
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
by: Sanders, Kate, et al.
Published: (2024)
by: Sanders, Kate, et al.
Published: (2024)
HATL: Hierarchical Adaptive-Transfer Learning Framework for Sign Language Machine Translation
by: Shahin, Nada, et al.
Published: (2026)
by: Shahin, Nada, et al.
Published: (2026)
NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation
by: Gandyra, Max, et al.
Published: (2025)
by: Gandyra, Max, et al.
Published: (2025)
SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding
by: Gutiérrez-Pérez, Marc, et al.
Published: (2025)
by: Gutiérrez-Pérez, Marc, et al.
Published: (2025)
Methods and strategies for improving the novel view synthesis quality of neural radiation field
by: Fang, Shun, et al.
Published: (2024)
by: Fang, Shun, et al.
Published: (2024)
Similar Items
-
On the Limitations of Vision-Language Models in Understanding Image Transforms
by: Anis, Ahmad Mustafa, et al.
Published: (2025) -
Transfer-learning for video classification: Video Swin Transformer on multiple domains
by: Oliveira, Daniel A. P., et al.
Published: (2022) -
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations
by: Hütten, Nils, et al.
Published: (2025) -
From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation
by: Shahin, Nada, et al.
Published: (2024) -
ADAT: Time-Series-Aware Adaptive Transformer Architecture for Sign Language Translation
by: Shahin, Nada, et al.
Published: (2025)