:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Su, Chris, Li, Harrison, Marques, Matheus, Flint, George, Zhu, Kevin, Dev, Sunishchal
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.15974
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
by: Chang, Vincent, et al.
Published: (2025)

Broken Chains: The Cost of Incomplete Reasoning in LLMs
by: Su, Ian, et al.
Published: (2026)

ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs
by: Thomas, Rohan Subramanian, et al.
Published: (2026)

Peek-a-Boo Reasoning: Contrastive Region Masking in MLLMs
by: Chaturvedi, Isha, et al.
Published: (2025)

PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases
by: Vuddanti, Sri Vatsa, et al.
Published: (2025)

Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
by: Mao, Nathan, et al.
Published: (2026)

Sumudu Neural Operator for ODEs and PDEs
by: Zelenskiy, Ben, et al.
Published: (2025)

DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code
by: Agrawal, Shriyansh, et al.
Published: (2025)

Judge Reliability Harness: Stress Testing the Reliability of LLM Judges
by: Dev, Sunishchal, et al.
Published: (2026)

Agentic Reasoning for Large Language Models
by: Wei, Tianxin, et al.
Published: (2026)

Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization
by: Egbuna, Nathan, et al.
Published: (2025)

CA-BED: Conversation-Aware Bayesian Experimental Design
by: Arnould, Daniel, et al.
Published: (2026)

AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI
by: Rana, Manik, et al.
Published: (2025)

Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
by: Liao, Yi, et al.
Published: (2025)

Enhance Reasoning for Large Language Models in the Game Werewolf
by: Wu, Shuang, et al.
Published: (2024)

Distributed Interpretability and Control for Large Language Models
by: Desai, Dev Arpan, et al.
Published: (2026)

FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness
by: Swaroop, Anand, et al.
Published: (2025)

Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
by: Wu, Junde, et al.
Published: (2025)

Rethinking Agentic Reinforcement Learning In Large Language Models
by: Cui, Fangming, et al.
Published: (2026)

Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
by: Yang, Yukang, et al.
Published: (2025)

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought
by: Batra, Shourya, et al.
Published: (2025)

Game Reasoning Arena: A Framework and Benchmark for Assessing Reasoning Capabilities of Large Language Models via Game Play
by: Cipolina-Kun, Lucia, et al.
Published: (2025)

Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning
by: Lu, Leo, et al.
Published: (2025)

TableReasoner: Advancing Table Reasoning Framework with Large Language Models
by: Xiong, Sishi, et al.
Published: (2025)

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
by: Paglieri, Davide, et al.
Published: (2024)

A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models
by: Chang, Ching, et al.
Published: (2025)

M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models
by: Wang, Junjian, et al.
Published: (2026)

Disentangling Reasoning and Knowledge in Medical Large Language Models
by: Thapa, Rahul, et al.
Published: (2025)

Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations
by: Wei, Kevin L., et al.
Published: (2025)

On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
by: Stechly, Kaya, et al.
Published: (2024)

FinLLMs: A Framework for Financial Reasoning Dataset Generation with Large Language Models
by: Yuan, Ziqiang, et al.
Published: (2024)

On the Limits of Layer Pruning for Generative Reasoning in Large Language Models
by: Shrestha, Safal, et al.
Published: (2026)

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
by: Su, Hung-Ting, et al.
Published: (2024)

Search-o1: Agentic Search-Enhanced Large Reasoning Models
by: Li, Xiaoxi, et al.
Published: (2025)

GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
by: Ying, Yuchen, et al.
Published: (2026)

RAVEN: An Agentic Framework for Multimodal Entity Discovery from Large-Scale Video Collections
by: Rosa, Kevin Dela
Published: (2025)

AgenticEval: Toward Agentic and Self-Evolving Safety Evaluation of Large Language Models
by: Wang, Yixu, et al.
Published: (2025)

Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization
by: Zhao, XinYu, et al.
Published: (2026)

Emergent social conventions and collective bias in LLM populations
by: Ashery, Ariel Flint, et al.
Published: (2024)

Emergent Introspective Awareness in Large Language Models
by: Lindsey, Jack
Published: (2026)