:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ghosh, Adhiraj, Udandarao, Vishaal, Nguyen, Thao, Farina, Matteo, Cherti, Mehdi, Jitsev, Jenia, Oh, Sewoong, Ricci, Elisa, Schmidt, Ludwig, Bethge, Matthias
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2511.20643
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Good CREPE needs more than just Sugar: Investigating Biases in Compositional Vision-Language Benchmarks
by: Udandarao, Vishaal, et al.
Published: (2025)

ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities
by: Ghosh, Adhiraj, et al.
Published: (2024)

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
by: Udandarao, Vishaal, et al.
Published: (2024)

A Practitioner's Guide to Continual Multimodal Pretraining
by: Roth, Karsten, et al.
Published: (2024)

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
by: Nezhurina, Marianna, et al.
Published: (2024)

Reproducible scaling laws for contrastive language-image learning
by: Cherti, Mehdi, et al.
Published: (2022)

Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets
by: Nezhurina, Marianna, et al.
Published: (2025)

Inverse Deep Learning Ray Tracing for Heliostat Surface Prediction
by: Lewen, Jan, et al.
Published: (2024)

Scalable heliostat surface predictions from focal spots: Sim-to-Real transfer of inverse Deep Learning Raytracing
by: Lewen, Jan, et al.
Published: (2025)

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
by: Hochlehnert, Andreas, et al.
Published: (2025)

Solving Spatial Supersensing Without Spatial Supersensing
by: Udandarao, Vishaal, et al.
Published: (2025)

Efficient Lifelong Model Evaluation in an Era of Rapid Progress
by: Prabhu, Ameya, et al.
Published: (2024)

CiteME: Can Language Models Accurately Cite Scientific Claims?
by: Press, Ori, et al.
Published: (2024)

Resolving Discrepancies in Compute-Optimal Scaling of Language Models
by: Porian, Tomer, et al.
Published: (2024)

How to Merge Your Multimodal Models Over Time?
by: Dziadzio, Sebastian, et al.
Published: (2024)

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs
by: Schuhmann, Christoph, et al.
Published: (2025)

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
by: Nguyen, Thao, et al.
Published: (2025)

Better Alignment with Instruction Back-and-Forth Translation
by: Nguyen, Thao, et al.
Published: (2024)

Game Reasoning Arena: A Framework and Benchmark for Assessing Reasoning Capabilities of Large Language Models via Game Play
by: Cipolina-Kun, Lucia, et al.
Published: (2025)

AudioToolAgent: An Agentic Framework for Audio-Language Models
by: Wijngaard, Gijs, et al.
Published: (2025)

Multilingual Diversity Improves Vision-Language Representations
by: Nguyen, Thao, et al.
Published: (2024)

Improving Performance, Robustness, and Fairness of Radiographic AI Models with Finely-Controllable Synthetic Data
by: Moroianu, Stefania L., et al.
Published: (2025)

LLM generation novelty through the lens of semantic similarity
by: Davydov, Philipp, et al.
Published: (2025)

Data-Centric Lessons To Improve Speech-Language Pretraining
by: Udandarao, Vishaal, et al.
Published: (2025)

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages
by: Farina, Matteo, et al.
Published: (2025)

Linear Model Merging Unlocks Simple and Scalable Multimodal Data Mixture Optimization
by: Berasi, Davide, et al.
Published: (2026)

Paradoxes of Social Capital
by: Cherti, Myriam
Published: (2010)

Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training
by: Chattopadhyay, Adhiraj
Published: (2026)

Equivariance by Contrast: Identifiable Equivariant Embeddings from Unlabeled Finite Group Actions
by: Schmidt, Tobias, et al.
Published: (2025)

Large Multimodal Models as General In-Context Classifiers
by: Garosi, Marco, et al.
Published: (2026)

Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
by: Berasi, Davide, et al.
Published: (2025)

Frustratingly Easy Test-Time Adaptation of Vision-Language Models
by: Farina, Matteo, et al.
Published: (2024)

Learning in Compact Spaces with Approximately Normalized Transformer
by: Franke, Jörg K. H., et al.
Published: (2025)

Sampling from Your Language Model One Byte at a Time
by: Hayase, Jonathan, et al.
Published: (2025)

Pretraining Frequency Predicts Compositional Generalization of CLIP on Real-World Tasks
by: Wiedemer, Thaddäus, et al.
Published: (2025)

Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models
by: Thede, Lukas, et al.
Published: (2024)

Investigating Continual Pretraining in Large Language Models: Insights and Implications
by: Yıldız, Çağatay, et al.
Published: (2024)

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
by: Banerjee, Adhiraj, et al.
Published: (2026)

Dissipative relativistic fluid flow: A simple Lorentz invariant causal model capturing entropy shocks in its zero viscosity limit
by: Reintjes, Moritz, et al.
Published: (2024)

CodecSep: Prompt-Driven Universal Sound Separation on Neural Audio Codec Latents
by: Banerjee, Adhiraj, et al.
Published: (2025)