Saved in:
| Main Authors: | Huang, Yiran, Thede, Lukas, Mancini, Massimiliano, Xu, Wenjia, Akata, Zeynep |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.20749 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency
by: Huang, Yiran, et al.
Published: (2026)
by: Huang, Yiran, et al.
Published: (2026)
Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers
by: Huang, Yiran, et al.
Published: (2026)
by: Huang, Yiran, et al.
Published: (2026)
Vision-by-Language for Training-Free Compositional Image Retrieval
by: Karthik, Shyamgopal, et al.
Published: (2023)
by: Karthik, Shyamgopal, et al.
Published: (2023)
WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs
by: Thede, Lukas, et al.
Published: (2025)
by: Thede, Lukas, et al.
Published: (2025)
Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models
by: Thede, Lukas, et al.
Published: (2024)
by: Thede, Lukas, et al.
Published: (2024)
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections
by: Bini, Massimo, et al.
Published: (2024)
by: Bini, Massimo, et al.
Published: (2024)
Beyond the final layer: Attentive multilayer fusion for vision transformers
by: Ciernik, Laure, et al.
Published: (2026)
by: Ciernik, Laure, et al.
Published: (2026)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
by: Girrbach, Leander, et al.
Published: (2024)
by: Girrbach, Leander, et al.
Published: (2024)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
by: Roth, Karsten, et al.
Published: (2023)
by: Roth, Karsten, et al.
Published: (2023)
A Systematic Study of In-the-Wild Model Merging for Large Language Models
by: Hitit, Oğuz Kağan, et al.
Published: (2025)
by: Hitit, Oğuz Kağan, et al.
Published: (2025)
Context-Aware Multimodal Pretraining
by: Roth, Karsten, et al.
Published: (2024)
by: Roth, Karsten, et al.
Published: (2024)
DeLoRA: Decoupling Angles and Strength in Low-rank Adaptation
by: Bini, Massimo, et al.
Published: (2025)
by: Bini, Massimo, et al.
Published: (2025)
How to Merge Your Multimodal Models Over Time?
by: Dziadzio, Sebastian, et al.
Published: (2024)
by: Dziadzio, Sebastian, et al.
Published: (2024)
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
by: Girrbach, Leander, et al.
Published: (2025)
by: Girrbach, Leander, et al.
Published: (2025)
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
by: Lin, Junyan, et al.
Published: (2024)
by: Lin, Junyan, et al.
Published: (2024)
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
by: Kim, Sanghwan, et al.
Published: (2024)
by: Kim, Sanghwan, et al.
Published: (2024)
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
by: Wen, Zichen, et al.
Published: (2025)
by: Wen, Zichen, et al.
Published: (2025)
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
by: Zhong, Weihong, et al.
Published: (2024)
by: Zhong, Weihong, et al.
Published: (2024)
A Practitioner's Guide to Continual Multimodal Pretraining
by: Roth, Karsten, et al.
Published: (2024)
by: Roth, Karsten, et al.
Published: (2024)
From Drop-off to Recovery: A Mechanistic Analysis of Segmentation in MLLMs
by: Wu, Boyong, et al.
Published: (2026)
by: Wu, Boyong, et al.
Published: (2026)
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
by: Bini, Massimo, et al.
Published: (2025)
by: Bini, Massimo, et al.
Published: (2025)
Sparse Autoencoders are Topic Models
by: Girrbach, Leander, et al.
Published: (2025)
by: Girrbach, Leander, et al.
Published: (2025)
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models
by: Girrbach, Leander, et al.
Published: (2025)
by: Girrbach, Leander, et al.
Published: (2025)
Investigating Persuasion Techniques in Arabic: An Empirical Study Leveraging Large Language Models
by: Alzahrani, Abdurahmman, et al.
Published: (2024)
by: Alzahrani, Abdurahmman, et al.
Published: (2024)
SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning
by: Xu, Mengya, et al.
Published: (2025)
by: Xu, Mengya, et al.
Published: (2025)
APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings
by: Spohn, Philipp, et al.
Published: (2026)
by: Spohn, Philipp, et al.
Published: (2026)
SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy
by: Dani, Meghal, et al.
Published: (2024)
by: Dani, Meghal, et al.
Published: (2024)
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
by: Ye, Weihao, et al.
Published: (2024)
by: Ye, Weihao, et al.
Published: (2024)
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models
by: De Min, Thomas, et al.
Published: (2026)
by: De Min, Thomas, et al.
Published: (2026)
Ovis: Structural Embedding Alignment for Multimodal Large Language Model
by: Lu, Shiyin, et al.
Published: (2024)
by: Lu, Shiyin, et al.
Published: (2024)
HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models
by: Guo, Yansong, et al.
Published: (2026)
by: Guo, Yansong, et al.
Published: (2026)
A Survey on Evaluation of Multimodal Large Language Models
by: Huang, Jiaxing, et al.
Published: (2024)
by: Huang, Jiaxing, et al.
Published: (2024)
Vision-centric Token Compression in Large Language Model
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
by: Gigant, Théo, et al.
Published: (2025)
by: Gigant, Théo, et al.
Published: (2025)
LLAVADI: What Matters For Multimodal Large Language Models Distillation
by: Xu, Shilin, et al.
Published: (2024)
by: Xu, Shilin, et al.
Published: (2024)
Model Composition for Multimodal Large Language Models
by: Chen, Chi, et al.
Published: (2024)
by: Chen, Chi, et al.
Published: (2024)
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
by: Wu, Qiong, et al.
Published: (2024)
by: Wu, Qiong, et al.
Published: (2024)
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)
by: Pan, Xichen, et al.
Published: (2023)
Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)
by: Liu, Zhuchenyang, et al.
Published: (2026)
Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models
by: Deniz, Omer Faruk, et al.
Published: (2026)
by: Deniz, Omer Faruk, et al.
Published: (2026)
Similar Items
-
Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency
by: Huang, Yiran, et al.
Published: (2026) -
Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers
by: Huang, Yiran, et al.
Published: (2026) -
Vision-by-Language for Training-Free Compositional Image Retrieval
by: Karthik, Shyamgopal, et al.
Published: (2023) -
WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs
by: Thede, Lukas, et al.
Published: (2025) -
Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models
by: Thede, Lukas, et al.
Published: (2024)