Saved in:
| Main Authors: | Townsend, Benjamin, May, Madison, Mackowiak, Katherine, Wells, Christopher |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.20101 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction
by: Park, Jonggwon, et al.
Published: (2025)
by: Park, Jonggwon, et al.
Published: (2025)
LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining
by: Shen, Huawen, et al.
Published: (2024)
by: Shen, Huawen, et al.
Published: (2024)
Model Interpretability and Rationale Extraction by Input Mask Optimization
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
Improving MLLM Historical Record Extraction with Test-Time Image
by: Archibald, Taylor, et al.
Published: (2025)
by: Archibald, Taylor, et al.
Published: (2025)
Robustness of Structured Data Extraction from Perspectively Distorted Documents
by: Nakada, Hyakka, et al.
Published: (2025)
by: Nakada, Hyakka, et al.
Published: (2025)
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
by: Tran, Minh-Tuan, et al.
Published: (2024)
by: Tran, Minh-Tuan, et al.
Published: (2024)
Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models
by: Groot, Tobias, et al.
Published: (2024)
by: Groot, Tobias, et al.
Published: (2024)
Improving Resnet-9 Generalization Trained on Small Datasets
by: Awad, Omar Mohamed, et al.
Published: (2023)
by: Awad, Omar Mohamed, et al.
Published: (2023)
Real-time Bangla Sign Language Translator
by: Pranto, Rotan Hawlader, et al.
Published: (2024)
by: Pranto, Rotan Hawlader, et al.
Published: (2024)
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
by: Li, Ang, et al.
Published: (2025)
by: Li, Ang, et al.
Published: (2025)
Omnimodal Dataset Distillation via High-order Proxy Alignment
by: Gao, Yuxuan, et al.
Published: (2026)
by: Gao, Yuxuan, et al.
Published: (2026)
From Pixels to Prose: A Large Dataset of Dense Image Captions
by: Singla, Vasu, et al.
Published: (2024)
by: Singla, Vasu, et al.
Published: (2024)
One Category One Prompt: Dataset Distillation using Diffusion Models
by: Abbasi, Ali, et al.
Published: (2024)
by: Abbasi, Ali, et al.
Published: (2024)
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
by: Madaan, Divyam, et al.
Published: (2025)
by: Madaan, Divyam, et al.
Published: (2025)
Neural Style Transfer for Synthesising a Dataset of Ancient Egyptian Hieroglyphs
by: Creed, Lewis Matheson
Published: (2025)
by: Creed, Lewis Matheson
Published: (2025)
SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers
by: Kawada, Takuro, et al.
Published: (2025)
by: Kawada, Takuro, et al.
Published: (2025)
WLASL-LEX: a Dataset for Recognising Phonological Properties in American Sign Language
by: Tavella, Federico, et al.
Published: (2022)
by: Tavella, Federico, et al.
Published: (2022)
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
by: Lù, Xing Han, et al.
Published: (2024)
by: Lù, Xing Han, et al.
Published: (2024)
Brazilian Portuguese Image Captioning with Transformers: A Study on Cross-Native-Translated Dataset
by: Bromonschenkel, Gabriel, et al.
Published: (2026)
by: Bromonschenkel, Gabriel, et al.
Published: (2026)
E-TSL: A Continuous Educational Turkish Sign Language Dataset with Baseline Methods
by: Öztürk, Şükrü, et al.
Published: (2024)
by: Öztürk, Şükrü, et al.
Published: (2024)
TechING: Towards Real World Technical Image Understanding via VLMs
by: Nadeem, Tafazzul, et al.
Published: (2026)
by: Nadeem, Tafazzul, et al.
Published: (2026)
DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams
by: Picón, Ginés Carreto, et al.
Published: (2025)
by: Picón, Ginés Carreto, et al.
Published: (2025)
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
by: Qian, Yusu, et al.
Published: (2025)
by: Qian, Yusu, et al.
Published: (2025)
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
by: Heyward, Joseph, et al.
Published: (2024)
by: Heyward, Joseph, et al.
Published: (2024)
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction
by: Zou, Henry Peng, et al.
Published: (2024)
by: Zou, Henry Peng, et al.
Published: (2024)
AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media
by: Gambetti, Alessandro, et al.
Published: (2024)
by: Gambetti, Alessandro, et al.
Published: (2024)
BanglishRev: A Large-Scale Bangla-English and Code-mixed Dataset of Product Reviews in E-Commerce
by: Shamael, Mohammad Nazmush, et al.
Published: (2024)
by: Shamael, Mohammad Nazmush, et al.
Published: (2024)
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
by: Yi, Hao, et al.
Published: (2024)
by: Yi, Hao, et al.
Published: (2024)
Improving Multimodal Large Language Models Using Continual Learning
by: Srivastava, Shikhar, et al.
Published: (2024)
by: Srivastava, Shikhar, et al.
Published: (2024)
GRASP: A Rehearsal Policy for Efficient Online Continual Learning
by: Harun, Md Yousuf, et al.
Published: (2023)
by: Harun, Md Yousuf, et al.
Published: (2023)
Multi-Modal Hallucination Control by Visual Information Grounding
by: Favero, Alessandro, et al.
Published: (2024)
by: Favero, Alessandro, et al.
Published: (2024)
Image-Caption Encoding for Improving Zero-Shot Generalization
by: Yu, Eric Yang, et al.
Published: (2024)
by: Yu, Eric Yang, et al.
Published: (2024)
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
by: Javadi, Farnoosh, et al.
Published: (2023)
by: Javadi, Farnoosh, et al.
Published: (2023)
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
by: Xiu, Lixin, et al.
Published: (2026)
by: Xiu, Lixin, et al.
Published: (2026)
Towards Efficient Vision-Language Tuning: More Information Density, More Generalizability
by: Hao, Tianxiang, et al.
Published: (2023)
by: Hao, Tianxiang, et al.
Published: (2023)
BloomVQA: Assessing Hierarchical Multi-modal Comprehension
by: Gong, Yunye, et al.
Published: (2023)
by: Gong, Yunye, et al.
Published: (2023)
Improve Academic Query Resolution through BERT-based Question Extraction from Images
by: Kamal, Nidhi, et al.
Published: (2024)
by: Kamal, Nidhi, et al.
Published: (2024)
Exploring Attention Mechanisms in Integration of Multi-Modal Information for Sign Language Recognition and Translation
by: Hakim, Zaber Ibn Abdul, et al.
Published: (2023)
by: Hakim, Zaber Ibn Abdul, et al.
Published: (2023)
FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information
by: Gul, Shreen, et al.
Published: (2024)
by: Gul, Shreen, et al.
Published: (2024)
RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users
by: Ye, Suyu, et al.
Published: (2025)
by: Ye, Suyu, et al.
Published: (2025)
Similar Items
-
RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction
by: Park, Jonggwon, et al.
Published: (2025) -
LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining
by: Shen, Huawen, et al.
Published: (2024) -
Model Interpretability and Rationale Extraction by Input Mask Optimization
by: Brinner, Marc, et al.
Published: (2025) -
Improving MLLM Historical Record Extraction with Test-Time Image
by: Archibald, Taylor, et al.
Published: (2025) -
Robustness of Structured Data Extraction from Perspectively Distorted Documents
by: Nakada, Hyakka, et al.
Published: (2025)