Saved in:
| Main Authors: | Su, Chris, Li, Harrison, Marques, Matheus, Flint, George, Zhu, Kevin, Dev, Sunishchal |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.15974 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
by: Chang, Vincent, et al.
Published: (2025)
by: Chang, Vincent, et al.
Published: (2025)
Broken Chains: The Cost of Incomplete Reasoning in LLMs
by: Su, Ian, et al.
Published: (2026)
by: Su, Ian, et al.
Published: (2026)
ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs
by: Thomas, Rohan Subramanian, et al.
Published: (2026)
by: Thomas, Rohan Subramanian, et al.
Published: (2026)
Peek-a-Boo Reasoning: Contrastive Region Masking in MLLMs
by: Chaturvedi, Isha, et al.
Published: (2025)
by: Chaturvedi, Isha, et al.
Published: (2025)
PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases
by: Vuddanti, Sri Vatsa, et al.
Published: (2025)
by: Vuddanti, Sri Vatsa, et al.
Published: (2025)
Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
by: Mao, Nathan, et al.
Published: (2026)
by: Mao, Nathan, et al.
Published: (2026)
Sumudu Neural Operator for ODEs and PDEs
by: Zelenskiy, Ben, et al.
Published: (2025)
by: Zelenskiy, Ben, et al.
Published: (2025)
DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code
by: Agrawal, Shriyansh, et al.
Published: (2025)
by: Agrawal, Shriyansh, et al.
Published: (2025)
Judge Reliability Harness: Stress Testing the Reliability of LLM Judges
by: Dev, Sunishchal, et al.
Published: (2026)
by: Dev, Sunishchal, et al.
Published: (2026)
Agentic Reasoning for Large Language Models
by: Wei, Tianxin, et al.
Published: (2026)
by: Wei, Tianxin, et al.
Published: (2026)
Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization
by: Egbuna, Nathan, et al.
Published: (2025)
by: Egbuna, Nathan, et al.
Published: (2025)
CA-BED: Conversation-Aware Bayesian Experimental Design
by: Arnould, Daniel, et al.
Published: (2026)
by: Arnould, Daniel, et al.
Published: (2026)
AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI
by: Rana, Manik, et al.
Published: (2025)
by: Rana, Manik, et al.
Published: (2025)
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
by: Liao, Yi, et al.
Published: (2025)
by: Liao, Yi, et al.
Published: (2025)
Enhance Reasoning for Large Language Models in the Game Werewolf
by: Wu, Shuang, et al.
Published: (2024)
by: Wu, Shuang, et al.
Published: (2024)
Distributed Interpretability and Control for Large Language Models
by: Desai, Dev Arpan, et al.
Published: (2026)
by: Desai, Dev Arpan, et al.
Published: (2026)
FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness
by: Swaroop, Anand, et al.
Published: (2025)
by: Swaroop, Anand, et al.
Published: (2025)
Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
by: Wu, Junde, et al.
Published: (2025)
by: Wu, Junde, et al.
Published: (2025)
Rethinking Agentic Reinforcement Learning In Large Language Models
by: Cui, Fangming, et al.
Published: (2026)
by: Cui, Fangming, et al.
Published: (2026)
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
by: Yang, Yukang, et al.
Published: (2025)
by: Yang, Yukang, et al.
Published: (2025)
SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought
by: Batra, Shourya, et al.
Published: (2025)
by: Batra, Shourya, et al.
Published: (2025)
Game Reasoning Arena: A Framework and Benchmark for Assessing Reasoning Capabilities of Large Language Models via Game Play
by: Cipolina-Kun, Lucia, et al.
Published: (2025)
by: Cipolina-Kun, Lucia, et al.
Published: (2025)
Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning
by: Lu, Leo, et al.
Published: (2025)
by: Lu, Leo, et al.
Published: (2025)
TableReasoner: Advancing Table Reasoning Framework with Large Language Models
by: Xiong, Sishi, et al.
Published: (2025)
by: Xiong, Sishi, et al.
Published: (2025)
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
by: Paglieri, Davide, et al.
Published: (2024)
by: Paglieri, Davide, et al.
Published: (2024)
A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models
by: Chang, Ching, et al.
Published: (2025)
by: Chang, Ching, et al.
Published: (2025)
M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models
by: Wang, Junjian, et al.
Published: (2026)
by: Wang, Junjian, et al.
Published: (2026)
Disentangling Reasoning and Knowledge in Medical Large Language Models
by: Thapa, Rahul, et al.
Published: (2025)
by: Thapa, Rahul, et al.
Published: (2025)
Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations
by: Wei, Kevin L., et al.
Published: (2025)
by: Wei, Kevin L., et al.
Published: (2025)
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
by: Stechly, Kaya, et al.
Published: (2024)
by: Stechly, Kaya, et al.
Published: (2024)
FinLLMs: A Framework for Financial Reasoning Dataset Generation with Large Language Models
by: Yuan, Ziqiang, et al.
Published: (2024)
by: Yuan, Ziqiang, et al.
Published: (2024)
On the Limits of Layer Pruning for Generative Reasoning in Large Language Models
by: Shrestha, Safal, et al.
Published: (2026)
by: Shrestha, Safal, et al.
Published: (2026)
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
by: Su, Hung-Ting, et al.
Published: (2024)
by: Su, Hung-Ting, et al.
Published: (2024)
Search-o1: Agentic Search-Enhanced Large Reasoning Models
by: Li, Xiaoxi, et al.
Published: (2025)
by: Li, Xiaoxi, et al.
Published: (2025)
GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
by: Ying, Yuchen, et al.
Published: (2026)
by: Ying, Yuchen, et al.
Published: (2026)
RAVEN: An Agentic Framework for Multimodal Entity Discovery from Large-Scale Video Collections
by: Rosa, Kevin Dela
Published: (2025)
by: Rosa, Kevin Dela
Published: (2025)
AgenticEval: Toward Agentic and Self-Evolving Safety Evaluation of Large Language Models
by: Wang, Yixu, et al.
Published: (2025)
by: Wang, Yixu, et al.
Published: (2025)
Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization
by: Zhao, XinYu, et al.
Published: (2026)
by: Zhao, XinYu, et al.
Published: (2026)
Emergent social conventions and collective bias in LLM populations
by: Ashery, Ariel Flint, et al.
Published: (2024)
by: Ashery, Ariel Flint, et al.
Published: (2024)
Emergent Introspective Awareness in Large Language Models
by: Lindsey, Jack
Published: (2026)
by: Lindsey, Jack
Published: (2026)
Similar Items
-
Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
by: Chang, Vincent, et al.
Published: (2025) -
Broken Chains: The Cost of Incomplete Reasoning in LLMs
by: Su, Ian, et al.
Published: (2026) -
ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs
by: Thomas, Rohan Subramanian, et al.
Published: (2026) -
Peek-a-Boo Reasoning: Contrastive Region Masking in MLLMs
by: Chaturvedi, Isha, et al.
Published: (2025) -
PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases
by: Vuddanti, Sri Vatsa, et al.
Published: (2025)