:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Yiran, Thede, Lukas, Mancini, Massimiliano, Xu, Wenjia, Akata, Zeynep
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.20749
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency
by: Huang, Yiran, et al.
Published: (2026)

Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers
by: Huang, Yiran, et al.
Published: (2026)

Vision-by-Language for Training-Free Compositional Image Retrieval
by: Karthik, Shyamgopal, et al.
Published: (2023)

WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs
by: Thede, Lukas, et al.
Published: (2025)

Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models
by: Thede, Lukas, et al.
Published: (2024)

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections
by: Bini, Massimo, et al.
Published: (2024)

Beyond the final layer: Attentive multilayer fusion for vision transformers
by: Ciernik, Laure, et al.
Published: (2026)

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
by: Girrbach, Leander, et al.
Published: (2024)

Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
by: Roth, Karsten, et al.
Published: (2023)

A Systematic Study of In-the-Wild Model Merging for Large Language Models
by: Hitit, Oğuz Kağan, et al.
Published: (2025)

Context-Aware Multimodal Pretraining
by: Roth, Karsten, et al.
Published: (2024)

DeLoRA: Decoupling Angles and Strength in Low-rank Adaptation
by: Bini, Massimo, et al.
Published: (2025)

How to Merge Your Multimodal Models Over Time?
by: Dziadzio, Sebastian, et al.
Published: (2024)

Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
by: Girrbach, Leander, et al.
Published: (2025)

To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
by: Lin, Junyan, et al.
Published: (2024)

COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
by: Kim, Sanghwan, et al.
Published: (2024)

Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
by: Wen, Zichen, et al.
Published: (2025)

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
by: Zhong, Weihong, et al.
Published: (2024)

A Practitioner's Guide to Continual Multimodal Pretraining
by: Roth, Karsten, et al.
Published: (2024)

From Drop-off to Recovery: A Mechanistic Analysis of Segmentation in MLLMs
by: Wu, Boyong, et al.
Published: (2026)

MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
by: Bini, Massimo, et al.
Published: (2025)

Sparse Autoencoders are Topic Models
by: Girrbach, Leander, et al.
Published: (2025)

A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models
by: Girrbach, Leander, et al.
Published: (2025)

Investigating Persuasion Techniques in Arabic: An Empirical Study Leveraging Large Language Models
by: Alzahrani, Abdurahmman, et al.
Published: (2024)

SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning
by: Xu, Mengya, et al.
Published: (2025)

APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings
by: Spohn, Philipp, et al.
Published: (2026)

SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy
by: Dani, Meghal, et al.
Published: (2024)

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
by: Ye, Weihao, et al.
Published: (2024)

ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models
by: De Min, Thomas, et al.
Published: (2026)

Ovis: Structural Embedding Alignment for Multimodal Large Language Model
by: Lu, Shiyin, et al.
Published: (2024)

HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models
by: Guo, Yansong, et al.
Published: (2026)

A Survey on Evaluation of Multimodal Large Language Models
by: Huang, Jiaxing, et al.
Published: (2024)

Vision-centric Token Compression in Large Language Model
by: Xing, Ling, et al.
Published: (2025)

Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
by: Gigant, Théo, et al.
Published: (2025)

LLAVADI: What Matters For Multimodal Large Language Models Distillation
by: Xu, Shilin, et al.
Published: (2024)

Model Composition for Multimodal Large Language Models
by: Chen, Chi, et al.
Published: (2024)

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
by: Wu, Qiong, et al.
Published: (2024)

Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)

Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)

Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models
by: Deniz, Omer Faruk, et al.
Published: (2026)