:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Yixin, Zhang, Shuai, Han, Boran, He, Tong, Li, Bo
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2401.03149
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Visual Instruction Tuning with Chain of Region-of-Interest
by: Chen, Yixin, et al.
Published: (2025)

Generative Multimodal Models are In-Context Learners
by: Sun, Quan, et al.
Published: (2023)

Bridging Remote Sensors with Multisensor Geospatial Foundation Models
by: Han, Boran, et al.
Published: (2024)

Make LVLMs Focus: Context-Aware Attention Modulation for Better Multimodal In-Context Learning
by: Li, Yanshu, et al.
Published: (2025)

Hallucination of Multimodal Large Language Models: A Survey
by: Bai, Zechen, et al.
Published: (2024)

Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
by: Chen, Meiqi, et al.
Published: (2024)

CAST: Collapse-Aware multi-Scale Topology Fusion for Multimodal Coreset Selection
by: Zhao, Boran, et al.
Published: (2026)

Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models
by: Sun, Jingchen, et al.
Published: (2026)

Emu3.5: Native Multimodal Models are World Learners
by: Cui, Yufeng, et al.
Published: (2025)

TrimTokenator-LC: Towards Adaptive Visual Token Pruning for Large Multimodal Models with Long Contexts
by: Zhang, Hao, et al.
Published: (2025)

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
by: Zhang, Kaichen, et al.
Published: (2024)

Video Diffusion Transformers are In-Context Learners
by: Fei, Zhengcong, et al.
Published: (2024)

Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
by: Chen, Shuo, et al.
Published: (2023)

ST-LLM: Large Language Models Are Effective Temporal Learners
by: Liu, Ruyang, et al.
Published: (2024)

Large Multimodal Models as General In-Context Classifiers
by: Garosi, Marco, et al.
Published: (2026)

SEED-Story: Multimodal Long Story Generation with Large Language Model
by: Yang, Shuai, et al.
Published: (2024)

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
by: Pi, Renjie, et al.
Published: (2024)

Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models
by: Hu, Lulu, et al.
Published: (2026)

Large Vision-Language Models as Emotion Recognizers in Context Awareness
by: Lei, Yuxuan, et al.
Published: (2024)

MMaDA: Multimodal Large Diffusion Language Models
by: Yang, Ling, et al.
Published: (2025)

Personal Visual Context Learning in Large Multimodal Models
by: Xue, Zihui, et al.
Published: (2026)

Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
by: Chen, Junkai, et al.
Published: (2026)

Making Large Vision Language Models to be Good Few-shot Learners
by: Liu, Fan, et al.
Published: (2024)

3D CoCa: Contrastive Learners are 3D Captioners
by: Huang, Ting, et al.
Published: (2025)

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
by: Tong, Bo, et al.
Published: (2024)

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
by: Cai, Yuxuan, et al.
Published: (2024)

MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts
by: Liang, Hao, et al.
Published: (2024)

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
by: Tian, Ye, et al.
Published: (2025)

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos
by: Kumar, Deepak, et al.
Published: (2026)

Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
by: Peng, Xingkai, et al.
Published: (2025)

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
by: Ye, Junyan, et al.
Published: (2024)

Culture-Aware Humorous Captioning: Multimodal Humor Generation across Cultural Contexts
by: Xu, Run, et al.
Published: (2026)

MAD: Makeup All-in-One with Cross-Domain Diffusion Model
by: Ruan, Bo-Kai, et al.
Published: (2025)

Toward Robust Multimodal Learning using Multimodal Foundational Models
by: Zhao, Xianbing, et al.
Published: (2024)

Towards Language-Driven Video Inpainting via Multimodal Large Language Models
by: Wu, Jianzong, et al.
Published: (2024)

Enhanced Motion Forecasting with Plug-and-Play Multimodal Large Language Models
by: Luo, Katie, et al.
Published: (2025)

CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
by: Luo, Fuwen, et al.
Published: (2024)

Enhancing Radiographic Disease Detection with MetaCheX, a Context-Aware Multimodal Model
by: He, Nathan, et al.
Published: (2025)

Region-Level Context-Aware Multimodal Understanding
by: Wei, Hongliang, et al.
Published: (2025)