:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hamza, Ameer, Abdullah, Ahn, Yong Hyun, Lee, Sungyoung, Kim, Seong Tae
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.04749
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Resource-Efficient Medical Report Generation using Large Language Models
by: Abdullah, et al.
Published: (2024)

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
by: Shu, Fangxun, et al.
Published: (2024)

VLM-KG: Multimodal Radiology Knowledge Graph Generation
by: Abdullah, Abdullah, et al.
Published: (2025)

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
by: Caffagni, Davide, et al.
Published: (2024)

ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos
by: Vuong, Trinh T. L., et al.
Published: (2025)

LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models
by: Gkalelis, Nikolaos, et al.
Published: (2026)

WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
by: Ahn, Yong Hyun, et al.
Published: (2024)

Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: Görselle Sohbet Etmek
by: Zeer, Ahmed, et al.
Published: (2024)

TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
by: Yan, Dawei, et al.
Published: (2024)

LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
by: Lu, Weiheng, et al.
Published: (2024)

PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding
by: Dai, Dawei, et al.
Published: (2024)

Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases
by: Wang, Liqiong, et al.
Published: (2024)

X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment
by: Shin, Dongjae, et al.
Published: (2024)

Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages
by: Andersland, Michael
Published: (2024)

Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
by: Song, Jinsol, et al.
Published: (2025)

LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description
by: Jin, Yizhang, et al.
Published: (2024)

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
by: Cai, Yuxuan, et al.
Published: (2024)

LLaVA-SLT: Visual Language Tuning for Sign Language Translation
by: Liang, Han, et al.
Published: (2024)

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
by: Shang, Yuzhang, et al.
Published: (2024)

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
by: Shi, Wenhao, et al.
Published: (2024)

LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education
by: Lee, Unggi, et al.
Published: (2024)

Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
by: Lim, Su Hyeon, et al.
Published: (2024)

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
by: Sun, Guohao, et al.
Published: (2024)

LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound
by: Guo, Xuechen, et al.
Published: (2024)

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
by: Zhang, Ruiyi, et al.
Published: (2024)

MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2024)

MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2025)

LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration
by: Inal, Gokce, et al.
Published: (2026)

LLaVA-Critic: Learning to Evaluate Multimodal Models
by: Xiong, Tianyi, et al.
Published: (2024)

LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models
by: Wang, Jingyi, et al.
Published: (2024)

LLaVAC: Fine-tuning LLaVA as a Multimodal Sentiment Classifier
by: Chay-intr, T., et al.
Published: (2025)

Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety
by: Kim, Younggun, et al.
Published: (2025)

When LLaVA Meets Objects: Token Composition for Vision-Language-Models
by: Jahagirdar, Soumya, et al.
Published: (2026)

LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
by: Zhu, Yichen, et al.
Published: (2024)

Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications
by: Foutter, Matthew, et al.
Published: (2024)

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
by: Ye, Xubing, et al.
Published: (2024)

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
by: Lin, Bin, et al.
Published: (2024)

Why do LLaVA Vision-Language Models Reply to Images in English?
by: Hinck, Musashi, et al.
Published: (2024)

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
by: Xu, Guowei, et al.
Published: (2024)

Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
by: Cao, Meng, et al.
Published: (2024)