:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Egg, Alex, Goyanes, Martin Iglesias, Kingma, Friso, Mora, Andreu, von Werra, Leandro, Wolf, Thomas
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2506.23719
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data
von: Niklaus, Joel, et al.
Veröffentlicht: (2026)

EM Distillation for One-step Diffusion Models
von: Xie, Sirui, et al.
Veröffentlicht: (2024)

Flowing with Confidence
von: de Kruiff, Friso, et al.
Veröffentlicht: (2026)

Acting for the Right Reasons: Creating Reason-Sensitive Artificial Moral Agents
von: Baum, Kevin, et al.
Veröffentlicht: (2024)

DeMo: Decoupled Momentum Optimization
von: Peng, Bowen, et al.
Veröffentlicht: (2024)

Pullback Flow Matching on Data Manifolds
von: de Kruiff, Friso, et al.
Veröffentlicht: (2024)

Multi-Label Transfer Learning in Non-Stationary Data Streams
von: Du, Honghui, et al.
Veröffentlicht: (2025)

SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems
von: Shindo, Hikaru, et al.
Veröffentlicht: (2026)

Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
von: Beutel, Alex, et al.
Veröffentlicht: (2024)

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks
von: Thomas, Nishal, et al.
Veröffentlicht: (2026)

MALT: Improving Reasoning with Multi-Agent LLM Training
von: Motwani, Sumeet Ramesh, et al.
Veröffentlicht: (2024)

The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
von: Zhang, Ruichen, et al.
Veröffentlicht: (2025)

Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures
von: He, Yu, et al.
Veröffentlicht: (2025)

Federation over Text: Insight Sharing for Multi-Agent Reasoning
von: Yao, Dixi, et al.
Veröffentlicht: (2026)

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
von: Wei, Anjiang, et al.
Veröffentlicht: (2025)

FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data
von: Wang, Wenhao, et al.
Veröffentlicht: (2025)

Online Learning for Recommendations at Grubhub
von: Egg, Alex
Veröffentlicht: (2021)

Off-policy Evaluation for Payments at Adyen
von: Egg, Alex
Veröffentlicht: (2025)

BenchAgents: Multi-Agent Systems for Structured Benchmark Creation
von: Butt, Natasha, et al.
Veröffentlicht: (2024)

A Discordance-Aware Multimodal Framework with Multi-Agent Clinical Reasoning
von: Ahadian, Pegah, et al.
Veröffentlicht: (2026)

TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models
von: Yu, Fangxu, et al.
Veröffentlicht: (2026)

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
von: Gu, Alex, et al.
Veröffentlicht: (2024)

Towards More Accurate US Presidential Election via Multi-step Reasoning with Large Language Models
von: Yu, Chenxiao, et al.
Veröffentlicht: (2024)

Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experience
von: Sayana, Krishna, et al.
Veröffentlicht: (2026)

DataSciBench: An LLM Agent Benchmark for Data Science
von: Zhang, Dan, et al.
Veröffentlicht: (2025)

Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
von: Yang, Tong, et al.
Veröffentlicht: (2025)

Mol-Debate: Multi-Agent Debate Improves Structural Reasoning in Molecular Design
von: Zhang, Wengyu, et al.
Veröffentlicht: (2026)

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
von: Huang, Jianhao, et al.
Veröffentlicht: (2025)

Lookbehind-SAM: k steps back, 1 step forward
von: Mordido, Gonçalo, et al.
Veröffentlicht: (2023)

VAR-MATH: Probing True Mathematical Reasoning in LLMS via Symbolic Multi-Instance Benchmarks
von: Yao, Jian, et al.
Veröffentlicht: (2025)

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
von: Bettini, Matteo, et al.
Veröffentlicht: (2023)

ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
von: Fujisawa, Ippei, et al.
Veröffentlicht: (2024)

Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework
von: Sidorenko, Andrey, et al.
Veröffentlicht: (2025)

Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games
von: Wolf, Will
Veröffentlicht: (2025)

Reinforce LLM Reasoning through Multi-Agent Reflection
von: Yuan, Yurun, et al.
Veröffentlicht: (2025)

MARLINE: Multi-Source Mapping Transfer Learning for Non-Stationary Environments
von: Du, Honghui, et al.
Veröffentlicht: (2025)

Adam-mini: Use Fewer Learning Rates To Gain More
von: Zhang, Yushun, et al.
Veröffentlicht: (2024)

ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
von: Qiao, Ziqing, et al.
Veröffentlicht: (2025)

Transferring Causal Effects using Proxies
von: Iglesias-Alonso, Manuel, et al.
Veröffentlicht: (2025)

PuzzleJAX: A Benchmark for Reasoning and Learning
von: Earle, Sam, et al.
Veröffentlicht: (2025)