:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Matinez, Yago Romano, Roberts, Jesse
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2509.09867
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multiplayer Nash Preference Optimization
by: Wu, Fang, et al.
Published: (2025)

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
by: Yim, Yauwai, et al.
Published: (2024)

The Non-Determinism of Small LLMs: Evidence of Low Answer Consistency in Repetition Trials of Standard Multiple-Choice Benchmarks
by: Pinhanez, Claudio, et al.
Published: (2025)

Human-Alignment and Calibration of Inference-Time Uncertainty in Large Language Models
by: Moore, Kyle, et al.
Published: (2025)

Chain of Thought Still Thinks Fast: APriCoT Helps with Thinking Slow
by: Moore, Kyle, et al.
Published: (2024)

Are LLMs complicated ethical dilemma analyzers?
by: Jiashen, et al.
Published: (2025)

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
by: Zhang, Guibin, et al.
Published: (2025)

Computer Environments Elicit General Agentic Intelligence in LLMs
by: Cheng, Daixuan, et al.
Published: (2026)

The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance
by: Moore, Kyle, et al.
Published: (2024)

Large Language Model Recall Uncertainty is Modulated by the Fan Effect
by: Roberts, Jesse, et al.
Published: (2024)

GEM: A Gym for Agentic LLMs
by: Liu, Zichen, et al.
Published: (2025)

Improving Score Reliability of Multiple Choice Benchmarks with Consistency Evaluation and Altered Answer Choices
by: Cavalin, Paulo, et al.
Published: (2025)

Player-Driven Emergence in LLM-Driven Game Narrative
by: Peng, Xiangyu, et al.
Published: (2024)

TxGemma: Efficient and Agentic LLMs for Therapeutics
by: Wang, Eric, et al.
Published: (2025)

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
by: Li, Yangning, et al.
Published: (2025)

Agentic Adversarial QA for Improving Domain-Specific LLMs
by: Grari, Vincent, et al.
Published: (2026)

Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions
by: Zhao, Minda, et al.
Published: (2026)

Can "AI" Be a Doctor? A Study of Empathy, Readability, and Alignment in Clinical LLMs
by: Barone, Mariano, et al.
Published: (2026)

Collaborative Quest Completion with LLM-driven Non-Player Characters in Minecraft
by: Rao, Sudha, et al.
Published: (2024)

PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering
by: Nahid, Md Mahadi Hasan, et al.
Published: (2025)

Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset
by: Henkel, Owen, et al.
Published: (2023)

Tool Preferences in Agentic LLMs are Unreliable
by: Faghih, Kazem, et al.
Published: (2025)

Mitigating Hallucination in Large Language Models (LLMs): An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems
by: Li, Yihan, et al.
Published: (2025)

Can LLMs Time Travel? Enhancing Temporal Consistency in Legal Agentic Search through Reinforcement Learning
by: Fan, Wei, et al.
Published: (2026)

Targeted Visualization of the Backbone of Encoder LLMs
by: Roberts, Isaac, et al.
Published: (2024)

Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education
by: Henkel, Owen, et al.
Published: (2024)

Toward Optimal LLM Alignments Using Two-Player Games
by: Zheng, Rui, et al.
Published: (2024)

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs
by: Wang, Sijia, et al.
Published: (2026)

Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
by: Dai, Hankun, et al.
Published: (2025)

UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making
by: Duan, Jinhao, et al.
Published: (2025)

Are LLMs Ready for Neural-integrated Mechanistic Modeling? A Benchmark and Agentic Framework
by: Guan, Zihan, et al.
Published: (2026)

Agentic Confidence Calibration
by: Zhang, Jiaxin, et al.
Published: (2026)

Agentic Uncertainty Quantification
by: Zhang, Jiaxin, et al.
Published: (2026)

UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models
by: Chen, Chen, et al.
Published: (2025)

Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
by: Wu, Junde, et al.
Published: (2025)

AgenticSum: An Agentic Inference-Time Framework for Faithful Clinical Text Summarization
by: Piya, Fahmida Liza, et al.
Published: (2026)

More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration
by: Yadav, Advait, et al.
Published: (2026)

Are Large Vision Language Models Good Game Players?
by: Wang, Xinyu, et al.
Published: (2025)

Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking
by: Xu, Qinwu, et al.
Published: (2026)

AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation
by: Liu, Xianyang, et al.
Published: (2025)