:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Engländer, Leon, Althammer, Sophia, Üstün, Ahmet, Gallé, Matthias, Sherborne, Tom
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2604.17609
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
by: Khalifa, Muhammad, et al.
Published: (2024)

TRAM: Bridging Trust Regions and Sharpness Aware Minimization
by: Sherborne, Tom, et al.
Published: (2023)

How Does Quantization Affect Multilingual LLMs?
by: Marchisio, Kelly, et al.
Published: (2024)

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
by: Dang, John, et al.
Published: (2024)

Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
by: Na, Clara, et al.
Published: (2024)

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
by: D'souza, Daniel, et al.
Published: (2025)

Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
by: Gritsch, Nikolas, et al.
Published: (2024)

AgentBench: Evaluating LLMs as Agents
by: Liu, Xiao, et al.
Published: (2023)

MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions
by: Köksal, Abdullatif, et al.
Published: (2024)

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
by: Ahmadian, Arash, et al.
Published: (2024)

Curiosity-Driven LLM-as-a-judge for Personalized Creative Judgment
by: Kumar, Vanya Bannihatti, et al.
Published: (2025)

Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation
by: Dasgupta, Sayantan, et al.
Published: (2026)

Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
by: Vassoyan, Jean, et al.
Published: (2025)

WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs
by: Thede, Lukas, et al.
Published: (2025)

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
by: Zhou, Huichi, et al.
Published: (2025)

Curiosity-driven Red-teaming for Large Language Models
by: Hong, Zhang-Wei, et al.
Published: (2024)

PersonaGym: Evaluating Persona Agents and LLMs
by: Samuel, Vinay, et al.
Published: (2024)

Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions
by: Pezeshkpour, Pouya, et al.
Published: (2024)

Concept Bottleneck Large Language Models
by: Sun, Chung-En, et al.
Published: (2024)

Large Language Models Lack Temporal Awareness of Medical Knowledge
by: Guan, Zihan, et al.
Published: (2026)

Exploring LLM-based Agents for Root Cause Analysis
by: Roy, Devjeet, et al.
Published: (2024)

Memp: Exploring Agent Procedural Memory
by: Fang, Runnan, et al.
Published: (2025)

Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis
by: Chiang, Jeffrey Yang Fan, et al.
Published: (2025)

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents
by: Turk, Matt
Published: (2026)

LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
by: Wang, Yiming, et al.
Published: (2025)

ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges
by: Qian, Cheng, et al.
Published: (2025)

Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
by: Dai, Hankun, et al.
Published: (2025)

Aya Vision: Advancing the Frontier of Multilingual Multimodality
by: Dash, Saurabh, et al.
Published: (2025)

Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries
by: Ceraolo, Roberto, et al.
Published: (2024)

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
by: Li, Manling, et al.
Published: (2024)

MemAdapter: Fast Alignment across Agent Memory Paradigms via Generative Subgraph Retrieval
by: Zhang, Xin, et al.
Published: (2026)

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
by: Zala, Abhay, et al.
Published: (2024)

The Illusion of Certainty: Uncertainty Quantification for LLMs Fails under Ambiguity
by: Tomov, Tim, et al.
Published: (2025)

GPT-4o Lacks Core Features of Theory of Mind
by: Muchovej, John, et al.
Published: (2026)

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
by: Liao, Zeyi, et al.
Published: (2024)

Exploring Precision and Recall to assess the quality and diversity of LLMs
by: Bronnec, Florian Le, et al.
Published: (2024)

ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning
by: Yue, Ling, et al.
Published: (2024)

On Leakage of Code Generation Evaluation Datasets
by: Matton, Alexandre, et al.
Published: (2024)

Memento-Skills: Let Agents Design Agents
by: Zhou, Huichi, et al.
Published: (2026)