Saved in:
| Main Authors: | Hamza, Ameer, Abdullah, Ahn, Yong Hyun, Lee, Sungyoung, Kim, Seong Tae |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.04749 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Resource-Efficient Medical Report Generation using Large Language Models
by: Abdullah, et al.
Published: (2024)
by: Abdullah, et al.
Published: (2024)
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
by: Shu, Fangxun, et al.
Published: (2024)
by: Shu, Fangxun, et al.
Published: (2024)
VLM-KG: Multimodal Radiology Knowledge Graph Generation
by: Abdullah, Abdullah, et al.
Published: (2025)
by: Abdullah, Abdullah, et al.
Published: (2025)
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
by: Caffagni, Davide, et al.
Published: (2024)
by: Caffagni, Davide, et al.
Published: (2024)
ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos
by: Vuong, Trinh T. L., et al.
Published: (2025)
by: Vuong, Trinh T. L., et al.
Published: (2025)
LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models
by: Gkalelis, Nikolaos, et al.
Published: (2026)
by: Gkalelis, Nikolaos, et al.
Published: (2026)
WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
by: Ahn, Yong Hyun, et al.
Published: (2024)
by: Ahn, Yong Hyun, et al.
Published: (2024)
Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: Görselle Sohbet Etmek
by: Zeer, Ahmed, et al.
Published: (2024)
by: Zeer, Ahmed, et al.
Published: (2024)
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
by: Yan, Dawei, et al.
Published: (2024)
by: Yan, Dawei, et al.
Published: (2024)
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
by: Lu, Weiheng, et al.
Published: (2024)
by: Lu, Weiheng, et al.
Published: (2024)
PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding
by: Dai, Dawei, et al.
Published: (2024)
by: Dai, Dawei, et al.
Published: (2024)
Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases
by: Wang, Liqiong, et al.
Published: (2024)
by: Wang, Liqiong, et al.
Published: (2024)
X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment
by: Shin, Dongjae, et al.
Published: (2024)
by: Shin, Dongjae, et al.
Published: (2024)
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages
by: Andersland, Michael
Published: (2024)
by: Andersland, Michael
Published: (2024)
Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
by: Song, Jinsol, et al.
Published: (2025)
by: Song, Jinsol, et al.
Published: (2025)
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description
by: Jin, Yizhang, et al.
Published: (2024)
by: Jin, Yizhang, et al.
Published: (2024)
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
by: Cai, Yuxuan, et al.
Published: (2024)
by: Cai, Yuxuan, et al.
Published: (2024)
LLaVA-SLT: Visual Language Tuning for Sign Language Translation
by: Liang, Han, et al.
Published: (2024)
by: Liang, Han, et al.
Published: (2024)
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
by: Shang, Yuzhang, et al.
Published: (2024)
by: Shang, Yuzhang, et al.
Published: (2024)
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
by: Shi, Wenhao, et al.
Published: (2024)
by: Shi, Wenhao, et al.
Published: (2024)
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education
by: Lee, Unggi, et al.
Published: (2024)
by: Lee, Unggi, et al.
Published: (2024)
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
by: Lim, Su Hyeon, et al.
Published: (2024)
by: Lim, Su Hyeon, et al.
Published: (2024)
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
by: Sun, Guohao, et al.
Published: (2024)
by: Sun, Guohao, et al.
Published: (2024)
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound
by: Guo, Xuechen, et al.
Published: (2024)
by: Guo, Xuechen, et al.
Published: (2024)
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
by: Zhang, Ruiyi, et al.
Published: (2024)
by: Zhang, Ruiyi, et al.
Published: (2024)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2024)
by: An, Ruichuan, et al.
Published: (2024)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2025)
by: An, Ruichuan, et al.
Published: (2025)
LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration
by: Inal, Gokce, et al.
Published: (2026)
by: Inal, Gokce, et al.
Published: (2026)
LLaVA-Critic: Learning to Evaluate Multimodal Models
by: Xiong, Tianyi, et al.
Published: (2024)
by: Xiong, Tianyi, et al.
Published: (2024)
LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models
by: Wang, Jingyi, et al.
Published: (2024)
by: Wang, Jingyi, et al.
Published: (2024)
LLaVAC: Fine-tuning LLaVA as a Multimodal Sentiment Classifier
by: Chay-intr, T., et al.
Published: (2025)
by: Chay-intr, T., et al.
Published: (2025)
Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety
by: Kim, Younggun, et al.
Published: (2025)
by: Kim, Younggun, et al.
Published: (2025)
When LLaVA Meets Objects: Token Composition for Vision-Language-Models
by: Jahagirdar, Soumya, et al.
Published: (2026)
by: Jahagirdar, Soumya, et al.
Published: (2026)
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
by: Zhu, Yichen, et al.
Published: (2024)
by: Zhu, Yichen, et al.
Published: (2024)
Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications
by: Foutter, Matthew, et al.
Published: (2024)
by: Foutter, Matthew, et al.
Published: (2024)
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
by: Ye, Xubing, et al.
Published: (2024)
by: Ye, Xubing, et al.
Published: (2024)
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
by: Lin, Bin, et al.
Published: (2024)
by: Lin, Bin, et al.
Published: (2024)
Why do LLaVA Vision-Language Models Reply to Images in English?
by: Hinck, Musashi, et al.
Published: (2024)
by: Hinck, Musashi, et al.
Published: (2024)
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
by: Xu, Guowei, et al.
Published: (2024)
by: Xu, Guowei, et al.
Published: (2024)
Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
by: Cao, Meng, et al.
Published: (2024)
by: Cao, Meng, et al.
Published: (2024)
Similar Items
-
Resource-Efficient Medical Report Generation using Large Language Models
by: Abdullah, et al.
Published: (2024) -
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
by: Shu, Fangxun, et al.
Published: (2024) -
VLM-KG: Multimodal Radiology Knowledge Graph Generation
by: Abdullah, Abdullah, et al.
Published: (2025) -
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
by: Caffagni, Davide, et al.
Published: (2024) -
ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos
by: Vuong, Trinh T. L., et al.
Published: (2025)