Saved in:
| Main Authors: | Hu, Xinmiao, Wang, Chun, An, Ruihe, Shao, ChenYu, Ye, Xiaojun, Zhou, Sheng, Li, Liangcheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.19474 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models
by: Zhao, Qiyan, et al.
Published: (2025)
by: Zhao, Qiyan, et al.
Published: (2025)
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
by: Zhang, Ruiyi, et al.
Published: (2024)
by: Zhang, Ruiyi, et al.
Published: (2024)
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
by: Hinck, Musashi, et al.
Published: (2024)
by: Hinck, Musashi, et al.
Published: (2024)
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning
by: Zhang, Jianyi, et al.
Published: (2024)
by: Zhang, Jianyi, et al.
Published: (2024)
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
by: Yan, Dawei, et al.
Published: (2024)
by: Yan, Dawei, et al.
Published: (2024)
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
by: Ma, Yiwei, et al.
Published: (2024)
by: Ma, Yiwei, et al.
Published: (2024)
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity
by: Lan, Zhibin, et al.
Published: (2024)
by: Lan, Zhibin, et al.
Published: (2024)
Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
by: Huang, Wenxuan, et al.
Published: (2024)
by: Huang, Wenxuan, et al.
Published: (2024)
Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: Görselle Sohbet Etmek
by: Zeer, Ahmed, et al.
Published: (2024)
by: Zeer, Ahmed, et al.
Published: (2024)
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
by: Sun, Guohao, et al.
Published: (2024)
by: Sun, Guohao, et al.
Published: (2024)
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description
by: Jin, Yizhang, et al.
Published: (2024)
by: Jin, Yizhang, et al.
Published: (2024)
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education
by: Lee, Unggi, et al.
Published: (2024)
by: Lee, Unggi, et al.
Published: (2024)
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
by: Shang, Yuzhang, et al.
Published: (2024)
by: Shang, Yuzhang, et al.
Published: (2024)
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound
by: Guo, Xuechen, et al.
Published: (2024)
by: Guo, Xuechen, et al.
Published: (2024)
Mitigating Hallucinations in Large Language Models via Causal Reasoning
by: Li, Yuangang, et al.
Published: (2025)
by: Li, Yuangang, et al.
Published: (2025)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2025)
by: An, Ruichuan, et al.
Published: (2025)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
by: An, Ruichuan, et al.
Published: (2024)
by: An, Ruichuan, et al.
Published: (2024)
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
by: Zhang, Shaolei, et al.
Published: (2025)
by: Zhang, Shaolei, et al.
Published: (2025)
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
by: Cai, Mu, et al.
Published: (2023)
by: Cai, Mu, et al.
Published: (2023)
Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications
by: Foutter, Matthew, et al.
Published: (2024)
by: Foutter, Matthew, et al.
Published: (2024)
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
by: Shi, Wenhao, et al.
Published: (2024)
by: Shi, Wenhao, et al.
Published: (2024)
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
by: Caffagni, Davide, et al.
Published: (2024)
by: Caffagni, Davide, et al.
Published: (2024)
LLaVA-Critic: Learning to Evaluate Multimodal Models
by: Xiong, Tianyi, et al.
Published: (2024)
by: Xiong, Tianyi, et al.
Published: (2024)
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models
by: Jin, Juseong, et al.
Published: (2024)
by: Jin, Juseong, et al.
Published: (2024)
Purrfessor: A Fine-tuned Multimodal LLaVA Diet Health Chatbot
by: Lu, Linqi, et al.
Published: (2024)
by: Lu, Linqi, et al.
Published: (2024)
PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding
by: Dai, Dawei, et al.
Published: (2024)
by: Dai, Dawei, et al.
Published: (2024)
Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
by: Huang, Hongzhe, et al.
Published: (2024)
by: Huang, Hongzhe, et al.
Published: (2024)
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
by: Chen, Cong, et al.
Published: (2025)
by: Chen, Cong, et al.
Published: (2025)
Annotation-Efficient Vision-Language Model Adaptation to the Polish Language Using the LLaVA Framework
by: Statkiewicz, Grzegorz, et al.
Published: (2026)
by: Statkiewicz, Grzegorz, et al.
Published: (2026)
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
by: Cai, Yuxuan, et al.
Published: (2024)
by: Cai, Yuxuan, et al.
Published: (2024)
LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models
by: Gkalelis, Nikolaos, et al.
Published: (2026)
by: Gkalelis, Nikolaos, et al.
Published: (2026)
LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models
by: Zheng, Pengcheng, et al.
Published: (2026)
by: Zheng, Pengcheng, et al.
Published: (2026)
ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant
by: Xiang, Yifan, et al.
Published: (2025)
by: Xiang, Yifan, et al.
Published: (2025)
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image
by: Liang, Yuci, et al.
Published: (2024)
by: Liang, Yuci, et al.
Published: (2024)
Causal Decoding for Hallucination-Resistant Multimodal Large Language Models
by: Tan, Shiwei, et al.
Published: (2026)
by: Tan, Shiwei, et al.
Published: (2026)
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages
by: Andersland, Michael
Published: (2024)
by: Andersland, Michael
Published: (2024)
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
by: Ye, Xubing, et al.
Published: (2024)
by: Ye, Xubing, et al.
Published: (2024)
LLaVA-OneVision: Easy Visual Task Transfer
by: Li, Bo, et al.
Published: (2024)
by: Li, Bo, et al.
Published: (2024)
LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models
by: Wang, Jingyi, et al.
Published: (2024)
by: Wang, Jingyi, et al.
Published: (2024)
LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning
by: Li, Jiajie, et al.
Published: (2024)
by: Li, Jiajie, et al.
Published: (2024)
Similar Items
-
MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models
by: Zhao, Qiyan, et al.
Published: (2025) -
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
by: Zhang, Ruiyi, et al.
Published: (2024) -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
by: Hinck, Musashi, et al.
Published: (2024) -
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning
by: Zhang, Jianyi, et al.
Published: (2024) -
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
by: Yan, Dawei, et al.
Published: (2024)