Saved in:
| Main Authors: | Wang, Shuai, Nalisnick, Eric |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2309.12443 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fingerspelling within Sign Language Translation
by: Tanzer, Garrett
Published: (2024)
by: Tanzer, Garrett
Published: (2024)
One-Stage-TFS: Thai One-Stage Fingerspelling Dataset for Fingerspelling Recognition Frameworks
by: Lata, Siriwiwat, et al.
Published: (2024)
by: Lata, Siriwiwat, et al.
Published: (2024)
Signs of Language: Embodied Sign Language Fingerspelling Acquisition from Demonstrations for Human-Robot Interaction
by: Tavella, Federico, et al.
Published: (2022)
by: Tavella, Federico, et al.
Published: (2022)
Are vision language models robust to uncertain inputs?
by: Wang, Xi, et al.
Published: (2025)
by: Wang, Xi, et al.
Published: (2025)
Recognising BSL Fingerspelling in Continuous Signing Sequences
by: Chan, Alyssa, et al.
Published: (2026)
by: Chan, Alyssa, et al.
Published: (2026)
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
by: Hu, Jinyi, et al.
Published: (2023)
by: Hu, Jinyi, et al.
Published: (2023)
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
by: Atuhurra, Jesse, et al.
Published: (2024)
by: Atuhurra, Jesse, et al.
Published: (2024)
Rethinking Multilingual Vision-Language Translation: Dataset, Evaluation, and Adaptation
by: Wang, Xintong, et al.
Published: (2025)
by: Wang, Xintong, et al.
Published: (2025)
Improving Handshape Representations for Sign Language Processing: A Graph Neural Network Approach
by: Carbo, Alessa, et al.
Published: (2025)
by: Carbo, Alessa, et al.
Published: (2025)
HandReader: Advanced Techniques for Efficient Fingerspelling Recognition
by: Korotaev, Pavel, et al.
Published: (2025)
by: Korotaev, Pavel, et al.
Published: (2025)
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
by: Wang, Hongyu, et al.
Published: (2024)
by: Wang, Hongyu, et al.
Published: (2024)
Maya: An Instruction Finetuned Multilingual Multimodal Model
by: Alam, Nahid, et al.
Published: (2024)
by: Alam, Nahid, et al.
Published: (2024)
MUNIChus: Multilingual News Image Captioning Benchmark
by: Chen, Yuji, et al.
Published: (2026)
by: Chen, Yuji, et al.
Published: (2026)
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
by: Geigle, Gregor, et al.
Published: (2023)
by: Geigle, Gregor, et al.
Published: (2023)
PRIM: Towards Practical In-Image Multilingual Machine Translation
by: Tian, Yanzhi, et al.
Published: (2025)
by: Tian, Yanzhi, et al.
Published: (2025)
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
by: Salazar, Israfel, et al.
Published: (2025)
by: Salazar, Israfel, et al.
Published: (2025)
Behind Maya: Building a Multilingual Vision Language Model
by: Alam, Nahid, et al.
Published: (2025)
by: Alam, Nahid, et al.
Published: (2025)
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
by: Shafique, Bhuiyan Sanjid, et al.
Published: (2025)
by: Shafique, Bhuiyan Sanjid, et al.
Published: (2025)
ShortCheck: Checkworthiness Detection of Multilingual Short-Form Videos
by: Vatndal, Henrik, et al.
Published: (2025)
by: Vatndal, Henrik, et al.
Published: (2025)
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
by: Geigle, Gregor, et al.
Published: (2025)
by: Geigle, Gregor, et al.
Published: (2025)
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
by: Geigle, Gregor, et al.
Published: (2023)
by: Geigle, Gregor, et al.
Published: (2023)
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
by: Yue, Xiang, et al.
Published: (2024)
by: Yue, Xiang, et al.
Published: (2024)
SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes
by: Bhutani, Mukul, et al.
Published: (2024)
by: Bhutani, Mukul, et al.
Published: (2024)
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator
by: Zuo, Ronglai, et al.
Published: (2024)
by: Zuo, Ronglai, et al.
Published: (2024)
Cooperative Sentiment Agents for Multimodal Sentiment Analysis
by: Wang, Shanmin, et al.
Published: (2024)
by: Wang, Shanmin, et al.
Published: (2024)
FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation
by: Tanzer, Garrett
Published: (2024)
by: Tanzer, Garrett
Published: (2024)
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
by: Futeral, Matthieu, et al.
Published: (2024)
by: Futeral, Matthieu, et al.
Published: (2024)
PM4Bench: Benchmarking Large Vision-Language Models with Parallel Multilingual Multi-Modal Multi-task Corpus
by: Gao, Junyuan, et al.
Published: (2025)
by: Gao, Junyuan, et al.
Published: (2025)
Temporal Test-Time Adaptation with State-Space Models
by: Schirmer, Mona, et al.
Published: (2024)
by: Schirmer, Mona, et al.
Published: (2024)
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
by: Chow, Wei, et al.
Published: (2025)
by: Chow, Wei, et al.
Published: (2025)
Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Sign Language and Fingerspelling Recognition
by: Hirooka, Koki, et al.
Published: (2025)
by: Hirooka, Koki, et al.
Published: (2025)
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
by: Thai, Triet Minh, et al.
Published: (2023)
by: Thai, Triet Minh, et al.
Published: (2023)
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
by: Kriz, Reno, et al.
Published: (2024)
by: Kriz, Reno, et al.
Published: (2024)
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval
by: Dipta, Shubhashis Roy, et al.
Published: (2025)
by: Dipta, Shubhashis Roy, et al.
Published: (2025)
VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool
by: Wang, Yan, et al.
Published: (2024)
by: Wang, Yan, et al.
Published: (2024)
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
by: Das, Rocktim Jyoti, et al.
Published: (2024)
by: Das, Rocktim Jyoti, et al.
Published: (2024)
Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
by: Timans, Alexander, et al.
Published: (2024)
by: Timans, Alexander, et al.
Published: (2024)
Aya Vision: Advancing the Frontier of Multilingual Multimodality
by: Dash, Saurabh, et al.
Published: (2025)
by: Dash, Saurabh, et al.
Published: (2025)
Image-Text Relation Prediction for Multilingual Tweets
by: Rikters, Matīss, et al.
Published: (2025)
by: Rikters, Matīss, et al.
Published: (2025)
Interleaved Latent Visual Reasoning with Selective Perceptual Modeling
by: Dong, Shuai, et al.
Published: (2025)
by: Dong, Shuai, et al.
Published: (2025)
Similar Items
-
Fingerspelling within Sign Language Translation
by: Tanzer, Garrett
Published: (2024) -
One-Stage-TFS: Thai One-Stage Fingerspelling Dataset for Fingerspelling Recognition Frameworks
by: Lata, Siriwiwat, et al.
Published: (2024) -
Signs of Language: Embodied Sign Language Fingerspelling Acquisition from Demonstrations for Human-Robot Interaction
by: Tavella, Federico, et al.
Published: (2022) -
Are vision language models robust to uncertain inputs?
by: Wang, Xi, et al.
Published: (2025) -
Recognising BSL Fingerspelling in Continuous Signing Sequences
by: Chan, Alyssa, et al.
Published: (2026)