Saved in:
| Main Authors: | Chen, Yixin, Zhang, Shuai, Han, Boran, He, Tong, Li, Bo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.03149 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Visual Instruction Tuning with Chain of Region-of-Interest
by: Chen, Yixin, et al.
Published: (2025)
by: Chen, Yixin, et al.
Published: (2025)
Generative Multimodal Models are In-Context Learners
by: Sun, Quan, et al.
Published: (2023)
by: Sun, Quan, et al.
Published: (2023)
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
by: Han, Boran, et al.
Published: (2024)
by: Han, Boran, et al.
Published: (2024)
Make LVLMs Focus: Context-Aware Attention Modulation for Better Multimodal In-Context Learning
by: Li, Yanshu, et al.
Published: (2025)
by: Li, Yanshu, et al.
Published: (2025)
Hallucination of Multimodal Large Language Models: A Survey
by: Bai, Zechen, et al.
Published: (2024)
by: Bai, Zechen, et al.
Published: (2024)
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
by: Chen, Meiqi, et al.
Published: (2024)
by: Chen, Meiqi, et al.
Published: (2024)
CAST: Collapse-Aware multi-Scale Topology Fusion for Multimodal Coreset Selection
by: Zhao, Boran, et al.
Published: (2026)
by: Zhao, Boran, et al.
Published: (2026)
Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models
by: Sun, Jingchen, et al.
Published: (2026)
by: Sun, Jingchen, et al.
Published: (2026)
Emu3.5: Native Multimodal Models are World Learners
by: Cui, Yufeng, et al.
Published: (2025)
by: Cui, Yufeng, et al.
Published: (2025)
TrimTokenator-LC: Towards Adaptive Visual Token Pruning for Large Multimodal Models with Long Contexts
by: Zhang, Hao, et al.
Published: (2025)
by: Zhang, Hao, et al.
Published: (2025)
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
by: Zhang, Kaichen, et al.
Published: (2024)
by: Zhang, Kaichen, et al.
Published: (2024)
Video Diffusion Transformers are In-Context Learners
by: Fei, Zhengcong, et al.
Published: (2024)
by: Fei, Zhengcong, et al.
Published: (2024)
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
by: Chen, Shuo, et al.
Published: (2023)
by: Chen, Shuo, et al.
Published: (2023)
ST-LLM: Large Language Models Are Effective Temporal Learners
by: Liu, Ruyang, et al.
Published: (2024)
by: Liu, Ruyang, et al.
Published: (2024)
Large Multimodal Models as General In-Context Classifiers
by: Garosi, Marco, et al.
Published: (2026)
by: Garosi, Marco, et al.
Published: (2026)
SEED-Story: Multimodal Long Story Generation with Large Language Model
by: Yang, Shuai, et al.
Published: (2024)
by: Yang, Shuai, et al.
Published: (2024)
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
by: Pi, Renjie, et al.
Published: (2024)
by: Pi, Renjie, et al.
Published: (2024)
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)
by: Pan, Xichen, et al.
Published: (2023)
MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models
by: Hu, Lulu, et al.
Published: (2026)
by: Hu, Lulu, et al.
Published: (2026)
Large Vision-Language Models as Emotion Recognizers in Context Awareness
by: Lei, Yuxuan, et al.
Published: (2024)
by: Lei, Yuxuan, et al.
Published: (2024)
MMaDA: Multimodal Large Diffusion Language Models
by: Yang, Ling, et al.
Published: (2025)
by: Yang, Ling, et al.
Published: (2025)
Personal Visual Context Learning in Large Multimodal Models
by: Xue, Zihui, et al.
Published: (2026)
by: Xue, Zihui, et al.
Published: (2026)
Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
by: Chen, Junkai, et al.
Published: (2026)
by: Chen, Junkai, et al.
Published: (2026)
Making Large Vision Language Models to be Good Few-shot Learners
by: Liu, Fan, et al.
Published: (2024)
by: Liu, Fan, et al.
Published: (2024)
3D CoCa: Contrastive Learners are 3D Captioners
by: Huang, Ting, et al.
Published: (2025)
by: Huang, Ting, et al.
Published: (2025)
FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
by: Tong, Bo, et al.
Published: (2024)
by: Tong, Bo, et al.
Published: (2024)
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
by: Cai, Yuxuan, et al.
Published: (2024)
by: Cai, Yuxuan, et al.
Published: (2024)
MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts
by: Liang, Hao, et al.
Published: (2024)
by: Liang, Hao, et al.
Published: (2024)
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
by: Tian, Ye, et al.
Published: (2025)
by: Tian, Ye, et al.
Published: (2025)
GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos
by: Kumar, Deepak, et al.
Published: (2026)
by: Kumar, Deepak, et al.
Published: (2026)
Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
by: Peng, Xingkai, et al.
Published: (2025)
by: Peng, Xingkai, et al.
Published: (2025)
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
by: Ye, Junyan, et al.
Published: (2024)
by: Ye, Junyan, et al.
Published: (2024)
Culture-Aware Humorous Captioning: Multimodal Humor Generation across Cultural Contexts
by: Xu, Run, et al.
Published: (2026)
by: Xu, Run, et al.
Published: (2026)
MAD: Makeup All-in-One with Cross-Domain Diffusion Model
by: Ruan, Bo-Kai, et al.
Published: (2025)
by: Ruan, Bo-Kai, et al.
Published: (2025)
Toward Robust Multimodal Learning using Multimodal Foundational Models
by: Zhao, Xianbing, et al.
Published: (2024)
by: Zhao, Xianbing, et al.
Published: (2024)
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
by: Wu, Jianzong, et al.
Published: (2024)
by: Wu, Jianzong, et al.
Published: (2024)
Enhanced Motion Forecasting with Plug-and-Play Multimodal Large Language Models
by: Luo, Katie, et al.
Published: (2025)
by: Luo, Katie, et al.
Published: (2025)
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
by: Luo, Fuwen, et al.
Published: (2024)
by: Luo, Fuwen, et al.
Published: (2024)
Enhancing Radiographic Disease Detection with MetaCheX, a Context-Aware Multimodal Model
by: He, Nathan, et al.
Published: (2025)
by: He, Nathan, et al.
Published: (2025)
Region-Level Context-Aware Multimodal Understanding
by: Wei, Hongliang, et al.
Published: (2025)
by: Wei, Hongliang, et al.
Published: (2025)
Similar Items
-
Visual Instruction Tuning with Chain of Region-of-Interest
by: Chen, Yixin, et al.
Published: (2025) -
Generative Multimodal Models are In-Context Learners
by: Sun, Quan, et al.
Published: (2023) -
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
by: Han, Boran, et al.
Published: (2024) -
Make LVLMs Focus: Context-Aware Attention Modulation for Better Multimodal In-Context Learning
by: Li, Yanshu, et al.
Published: (2025) -
Hallucination of Multimodal Large Language Models: A Survey
by: Bai, Zechen, et al.
Published: (2024)