Saved in:
| Main Authors: | Engländer, Leon, Althammer, Sophia, Üstün, Ahmet, Gallé, Matthias, Sherborne, Tom |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.17609 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
by: Khalifa, Muhammad, et al.
Published: (2024)
by: Khalifa, Muhammad, et al.
Published: (2024)
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
by: Sherborne, Tom, et al.
Published: (2023)
by: Sherborne, Tom, et al.
Published: (2023)
How Does Quantization Affect Multilingual LLMs?
by: Marchisio, Kelly, et al.
Published: (2024)
by: Marchisio, Kelly, et al.
Published: (2024)
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
by: Dang, John, et al.
Published: (2024)
by: Dang, John, et al.
Published: (2024)
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
by: Na, Clara, et al.
Published: (2024)
by: Na, Clara, et al.
Published: (2024)
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
by: D'souza, Daniel, et al.
Published: (2025)
by: D'souza, Daniel, et al.
Published: (2025)
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
by: Gritsch, Nikolas, et al.
Published: (2024)
by: Gritsch, Nikolas, et al.
Published: (2024)
AgentBench: Evaluating LLMs as Agents
by: Liu, Xiao, et al.
Published: (2023)
by: Liu, Xiao, et al.
Published: (2023)
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions
by: Köksal, Abdullatif, et al.
Published: (2024)
by: Köksal, Abdullatif, et al.
Published: (2024)
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
by: Ahmadian, Arash, et al.
Published: (2024)
by: Ahmadian, Arash, et al.
Published: (2024)
Curiosity-Driven LLM-as-a-judge for Personalized Creative Judgment
by: Kumar, Vanya Bannihatti, et al.
Published: (2025)
by: Kumar, Vanya Bannihatti, et al.
Published: (2025)
Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation
by: Dasgupta, Sayantan, et al.
Published: (2026)
by: Dasgupta, Sayantan, et al.
Published: (2026)
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
by: Vassoyan, Jean, et al.
Published: (2025)
by: Vassoyan, Jean, et al.
Published: (2025)
WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs
by: Thede, Lukas, et al.
Published: (2025)
by: Thede, Lukas, et al.
Published: (2025)
Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
by: Zhou, Huichi, et al.
Published: (2025)
by: Zhou, Huichi, et al.
Published: (2025)
Curiosity-driven Red-teaming for Large Language Models
by: Hong, Zhang-Wei, et al.
Published: (2024)
by: Hong, Zhang-Wei, et al.
Published: (2024)
PersonaGym: Evaluating Persona Agents and LLMs
by: Samuel, Vinay, et al.
Published: (2024)
by: Samuel, Vinay, et al.
Published: (2024)
Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions
by: Pezeshkpour, Pouya, et al.
Published: (2024)
by: Pezeshkpour, Pouya, et al.
Published: (2024)
Concept Bottleneck Large Language Models
by: Sun, Chung-En, et al.
Published: (2024)
by: Sun, Chung-En, et al.
Published: (2024)
Large Language Models Lack Temporal Awareness of Medical Knowledge
by: Guan, Zihan, et al.
Published: (2026)
by: Guan, Zihan, et al.
Published: (2026)
Exploring LLM-based Agents for Root Cause Analysis
by: Roy, Devjeet, et al.
Published: (2024)
by: Roy, Devjeet, et al.
Published: (2024)
Memp: Exploring Agent Procedural Memory
by: Fang, Runnan, et al.
Published: (2025)
by: Fang, Runnan, et al.
Published: (2025)
Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis
by: Chiang, Jeffrey Yang Fan, et al.
Published: (2025)
by: Chiang, Jeffrey Yang Fan, et al.
Published: (2025)
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)
by: Dai, Runpeng, et al.
Published: (2025)
Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents
by: Turk, Matt
Published: (2026)
by: Turk, Matt
Published: (2026)
LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
by: Wang, Yiming, et al.
Published: (2025)
by: Wang, Yiming, et al.
Published: (2025)
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges
by: Qian, Cheng, et al.
Published: (2025)
by: Qian, Cheng, et al.
Published: (2025)
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
by: Dai, Hankun, et al.
Published: (2025)
by: Dai, Hankun, et al.
Published: (2025)
Aya Vision: Advancing the Frontier of Multilingual Multimodality
by: Dash, Saurabh, et al.
Published: (2025)
by: Dash, Saurabh, et al.
Published: (2025)
Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries
by: Ceraolo, Roberto, et al.
Published: (2024)
by: Ceraolo, Roberto, et al.
Published: (2024)
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
by: Li, Manling, et al.
Published: (2024)
by: Li, Manling, et al.
Published: (2024)
MemAdapter: Fast Alignment across Agent Memory Paradigms via Generative Subgraph Retrieval
by: Zhang, Xin, et al.
Published: (2026)
by: Zhang, Xin, et al.
Published: (2026)
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
by: Zala, Abhay, et al.
Published: (2024)
by: Zala, Abhay, et al.
Published: (2024)
The Illusion of Certainty: Uncertainty Quantification for LLMs Fails under Ambiguity
by: Tomov, Tim, et al.
Published: (2025)
by: Tomov, Tim, et al.
Published: (2025)
GPT-4o Lacks Core Features of Theory of Mind
by: Muchovej, John, et al.
Published: (2026)
by: Muchovej, John, et al.
Published: (2026)
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
by: Liao, Zeyi, et al.
Published: (2024)
by: Liao, Zeyi, et al.
Published: (2024)
Exploring Precision and Recall to assess the quality and diversity of LLMs
by: Bronnec, Florian Le, et al.
Published: (2024)
by: Bronnec, Florian Le, et al.
Published: (2024)
ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning
by: Yue, Ling, et al.
Published: (2024)
by: Yue, Ling, et al.
Published: (2024)
On Leakage of Code Generation Evaluation Datasets
by: Matton, Alexandre, et al.
Published: (2024)
by: Matton, Alexandre, et al.
Published: (2024)
Memento-Skills: Let Agents Design Agents
by: Zhou, Huichi, et al.
Published: (2026)
by: Zhou, Huichi, et al.
Published: (2026)
Similar Items
-
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
by: Khalifa, Muhammad, et al.
Published: (2024) -
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
by: Sherborne, Tom, et al.
Published: (2023) -
How Does Quantization Affect Multilingual LLMs?
by: Marchisio, Kelly, et al.
Published: (2024) -
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
by: Dang, John, et al.
Published: (2024) -
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
by: Na, Clara, et al.
Published: (2024)