Saved in:
| Main Authors: | Chen, Ziyang, Moscholios, Stylios |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.03848 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
by: Wang, Shengkang, et al.
Published: (2024)
by: Wang, Shengkang, et al.
Published: (2024)
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning
by: Dai, Yinpei, et al.
Published: (2024)
by: Dai, Yinpei, et al.
Published: (2024)
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
by: Luo, Jun, et al.
Published: (2024)
by: Luo, Jun, et al.
Published: (2024)
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
by: Cai, Chen, et al.
Published: (2024)
by: Cai, Chen, et al.
Published: (2024)
Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations
by: Zhu, Kangyu, et al.
Published: (2025)
by: Zhu, Kangyu, et al.
Published: (2025)
Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models
by: Talemi, Niloufar Alipour, et al.
Published: (2024)
by: Talemi, Niloufar Alipour, et al.
Published: (2024)
MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts
by: Liang, Hao, et al.
Published: (2024)
by: Liang, Hao, et al.
Published: (2024)
Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts
by: Wu, Xuyang, et al.
Published: (2024)
by: Wu, Xuyang, et al.
Published: (2024)
SignLLM: Sign Language Production Large Language Models
by: Fang, Sen, et al.
Published: (2024)
by: Fang, Sen, et al.
Published: (2024)
Text Prompt Injection of Vision Language Models
by: Zhu, Ruizhe
Published: (2025)
by: Zhu, Ruizhe
Published: (2025)
Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models
by: Xu, Qinwu, et al.
Published: (2026)
by: Xu, Qinwu, et al.
Published: (2026)
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
by: Kim, Donghoon, et al.
Published: (2025)
by: Kim, Donghoon, et al.
Published: (2025)
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution
by: Jiang, Aiwen, et al.
Published: (2024)
by: Jiang, Aiwen, et al.
Published: (2024)
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models
by: Liu, Xinyang, et al.
Published: (2023)
by: Liu, Xinyang, et al.
Published: (2023)
Personalized Multimodal Large Language Models: A Survey
by: Wu, Junda, et al.
Published: (2024)
by: Wu, Junda, et al.
Published: (2024)
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
by: Hua, Jiacheng, et al.
Published: (2026)
by: Hua, Jiacheng, et al.
Published: (2026)
MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models
by: Miao, Yongzhu, et al.
Published: (2023)
by: Miao, Yongzhu, et al.
Published: (2023)
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems
by: Li, Kaixin, et al.
Published: (2024)
by: Li, Kaixin, et al.
Published: (2024)
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models
by: Zhou, Qiji, et al.
Published: (2024)
by: Zhou, Qiji, et al.
Published: (2024)
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
by: Adhikari, Rabin, et al.
Published: (2024)
by: Adhikari, Rabin, et al.
Published: (2024)
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models
by: Koleilat, Taha, et al.
Published: (2024)
by: Koleilat, Taha, et al.
Published: (2024)
MLLMReID: Multimodal Large Language Model-based Person Re-identification
by: Yang, Shan, et al.
Published: (2024)
by: Yang, Shan, et al.
Published: (2024)
MoPD: Mixture-of-Prompts Distillation for Vision-Language Models
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
by: You, Haoxuan, et al.
Published: (2023)
by: You, Haoxuan, et al.
Published: (2023)
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
by: Lin, Yuanze, et al.
Published: (2024)
by: Lin, Yuanze, et al.
Published: (2024)
Cerberus: Real-Time Video Anomaly Detection via Cascaded Vision-Language Models
by: Zheng, Yue, et al.
Published: (2025)
by: Zheng, Yue, et al.
Published: (2025)
Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models
by: Lin, Junyan, et al.
Published: (2026)
by: Lin, Junyan, et al.
Published: (2026)
VEGAS: Mitigating Hallucinations in Large Vision-Language Models via Vision-Encoder Attention Guided Adaptive Steering
by: Wang, Zihu, et al.
Published: (2025)
by: Wang, Zihu, et al.
Published: (2025)
Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models
by: Weng, Fenghua, et al.
Published: (2025)
by: Weng, Fenghua, et al.
Published: (2025)
Show and Guide: Instructional-Plan Grounded Vision and Language Model
by: Glória-Silva, Diogo, et al.
Published: (2024)
by: Glória-Silva, Diogo, et al.
Published: (2024)
The Impact of Image Resolution on Biomedical Multimodal Large Language Models
by: Chen, Liangyu, et al.
Published: (2025)
by: Chen, Liangyu, et al.
Published: (2025)
Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis
by: Bucciarelli, Davide, et al.
Published: (2024)
by: Bucciarelli, Davide, et al.
Published: (2024)
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
by: Zhu, Wenxin, et al.
Published: (2025)
by: Zhu, Wenxin, et al.
Published: (2025)
What Do Vision-Language Models Encode for Personalized Image Aesthetics Assessment?
by: Ryu, Koki, et al.
Published: (2026)
by: Ryu, Koki, et al.
Published: (2026)
Task-Aware Resolution Optimization for Visual Large Language Models
by: Luo, Weiqing, et al.
Published: (2025)
by: Luo, Weiqing, et al.
Published: (2025)
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
by: Chen, Jiaxing, et al.
Published: (2024)
by: Chen, Jiaxing, et al.
Published: (2024)
Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models
by: Yuxuan, Cao, et al.
Published: (2025)
by: Yuxuan, Cao, et al.
Published: (2025)
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models
by: Lu, Jiaying, et al.
Published: (2023)
by: Lu, Jiaying, et al.
Published: (2023)
Generalizable Entity Grounding via Assistance of Large Language Model
by: Qi, Lu, et al.
Published: (2024)
by: Qi, Lu, et al.
Published: (2024)
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
Similar Items
-
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
by: Wang, Shengkang, et al.
Published: (2024) -
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning
by: Dai, Yinpei, et al.
Published: (2024) -
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
by: Luo, Jun, et al.
Published: (2024) -
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
by: Cai, Chen, et al.
Published: (2024) -
Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations
by: Zhu, Kangyu, et al.
Published: (2025)