:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nathani, Deepak, Madaan, Lovish, Roberts, Nicholas, Bashlykov, Nikolay, Menon, Ajay, Moens, Vincent, Budhiraja, Amar, Magka, Despoina, Vorotilov, Vladislav, Chaurasia, Gaurav, Hupkes, Dieuwke, Cabral, Ricardo Silveira, Shavrina, Tatiana, Foerster, Jakob, Bachrach, Yoram, Wang, William Yang, Raileanu, Roberta
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2502.14499
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance
by: Maiti, Shalini, et al.
Published: (2025)

Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models
by: Madaan, Lovish, et al.
Published: (2024)

APRES: An Agentic Paper Revision and Evaluation System
by: Zhao, Bingchen, et al.
Published: (2026)

Quantifying Variance in Evaluation Benchmarks
by: Madaan, Lovish, et al.
Published: (2024)

MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages
by: Hupkes, Dieuwke, et al.
Published: (2025)

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
by: Pepe, Alberto, et al.
Published: (2026)

From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
by: Ohmer, Xenia, et al.
Published: (2024)

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
by: Audran-Reiss, Alexis, et al.
Published: (2025)

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
by: Zhao, Bingchen, et al.
Published: (2025)

Interpretability of Language Models via Task Spaces
by: Weber, Lucas, et al.
Published: (2024)

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents
by: Lupidi, Alisia, et al.
Published: (2026)

LPDS: Evaluating LLM Robustness Through Logic-Preserving Difficulty Scaling
by: Mondorf, Philipp, et al.
Published: (2026)

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
by: Toledo, Edan, et al.
Published: (2025)

Bootstrapping Task Spaces for Self-Improvement
by: Jiang, Minqi, et al.
Published: (2025)

Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
by: Schaeffer, Rylan, et al.
Published: (2025)

Beyond Verifiable Rewards: Scaling Reinforcement Learning for Language Models to Unverifiable Data
by: Tang, Yunhao, et al.
Published: (2025)

Compute Optimal Scaling of Skills: Knowledge vs Reasoning
by: Roberts, Nicholas, et al.
Published: (2025)

Asking the Right Questions: Improving Reasoning with Generated Stepping Stones
by: Hu, Hengyuan, et al.
Published: (2026)

Neural Mean-Field Games: Extending Mean-Field Game Theory with Neural Stochastic Differential Equations
by: Thöni, Anna C. M., et al.
Published: (2025)

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
by: Thakur, Aman Singh, et al.
Published: (2024)

Scaling Small Agents Through Strategy Auctions
by: Alazraki, Lisa, et al.
Published: (2026)

AIRA_2: Overcoming Bottlenecks in AI Research Agents
by: Hambardzumyan, Karen, et al.
Published: (2026)

HARP: A challenging human-annotated math reasoning benchmark
by: Yue, Albert S., et al.
Published: (2024)

Adversarial Training for Process Reward Models
by: Juneja, Gurusha, et al.
Published: (2025)

Crowd IQ -- Aggregating Opinions to Boost Performance
by: Kosinski, Michal, et al.
Published: (2024)

Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?
by: Singh, Aaditya K., et al.
Published: (2024)

A detailed study of the variations found in the chrysalises of Aglais caschmirensis Kollar, 1844 (Lepidoptera: Papilionoidea, Nymphalidae)
by: Lovish Garlani
Published: (2023)

Annotated Checklist of Rhopalocera of Himachal Pradesh, India (Insecta: Lepidoptera)
by: Lovish Garlan
Published: (2024)

First record of Celaenorrhinus ratna daphne Evans, 1949 from Himachal Pradesh and its first photographic record from the Western Himalayas (Lepidoptera: Hesperiidae, Pyrginae)
by: Lovish Garlani
Published: (2022)

Unveiling the Hidden Gem: An Observational Report, Taxonomic Insights and First Photographic Evidence of Pseudochazara baldiva Moore, 1865, from India (Lepidoptera: Nymphalidae)
by: Lovish Garlani
Published: (2024)

Epistemic Dissonance and Modal Boundaries
by: Raileanu, Dragos
Published: (2025)

From MTEB to MTOB: Retrieval-Augmented Classification for Descriptive Grammars
by: Kornilov, Albert, et al.
Published: (2024)

A Comparative Study of Transfer Learning for Emotion Recognition using CNN and Modified VGG16 Models
by: Nathani, Samay
Published: (2024)

Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples
by: Jiralerspong, Marco, et al.
Published: (2023)

Modelling Chemical Reaction Networks using Neural Ordinary Differential Equations
by: Thöni, Anna C. M., et al.
Published: (2025)

Understanding the Effects of Domain Finetuning on LLMs
by: Tanwar, Eshaan, et al.
Published: (2025)

On Some Extensions of the Boué-Dupuis Variational Formula
by: Budhiraja, A.
Published: (2024)

Hyperagents
by: Zhang, Jenny, et al.
Published: (2026)

DUAS FACES DO PODER
by: Peter Bachrach
Published: (2011)

Rethinking Thinking Tokens: LLMs as Improvement Operators
by: Madaan, Lovish, et al.
Published: (2025)