:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guo, Tianyu, Zhu, Hanlin, Zhang, Ruiqi, Jiao, Jiantao, Mei, Song, Jordan, Michael I., Russell, Stuart
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.13913
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
by: Huang, Yixiao, et al.
Published: (2025)

GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
by: Zhu, Hanlin, et al.
Published: (2025)

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
by: Zhu, Banghua, et al.
Published: (2024)

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025)

IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis
by: Li, Hanyu, et al.
Published: (2025)

Lessons from Studying Two-Hop Latent Reasoning
by: Balesni, Mikita, et al.
Published: (2024)

GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
by: Zhou, Yang, et al.
Published: (2025)

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
by: Zhu, Hanlin, et al.
Published: (2024)

From Query to Logic: Ontology-Driven Multi-Hop Reasoning in LLMs
by: Bian, Haonan, et al.
Published: (2025)

Thinking LLMs: General Instruction Following with Thought Generation
by: Wu, Tianhao, et al.
Published: (2024)

How Likely Do LLMs with CoT Mimic Human Reasoning?
by: Bao, Guangsheng, et al.
Published: (2024)

Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons
by: Zhu, Banghua, et al.
Published: (2023)

How Do LLMs Use Their Depth?
by: Gupta, Akshat, et al.
Published: (2025)

How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning
by: Chen, Haoyang, et al.
Published: (2026)

Is Depth All You Need? An Exploration of Iterative Reasoning in LLMs
by: Wu, Zongqian, et al.
Published: (2025)

Reasoning Court: Combining Reasoning, Action, and Judgment for Multi-Hop Reasoning
by: Wu, Jingtian, et al.
Published: (2025)

Under the Shadow of Babel: How Language Shapes Reasoning in LLMs
by: Wang, Chenxi, et al.
Published: (2025)

What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context for Multi-Hop QA
by: Chang, Zhiyuan, et al.
Published: (2024)

Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis
by: Yadav, Anushka, et al.
Published: (2025)

Avoiding Catastrophe in Online Learning by Asking for Help
by: Plaut, Benjamin, et al.
Published: (2024)

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
by: Wang, Shenzhi, et al.
Published: (2026)

How Well Do LLMs Understand Tunisian Arabic?
by: Mahdi, Mohamed
Published: (2025)

How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?
by: Cui, Yingqian, et al.
Published: (2026)

Multi-Hop Reasoning for Question Answering with Hyperbolic Representations
by: Welz, Simon, et al.
Published: (2025)

Contextual Drag: How Errors in the Context Affect LLM Reasoning
by: Cheng, Yun, et al.
Published: (2026)

Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?
by: Mei, Zhiting, et al.
Published: (2025)

How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
by: Zhang, Ruiqi, et al.
Published: (2024)

LLMs for Relational Reasoning: How Far are We?
by: Li, Zhiming, et al.
Published: (2024)

Do LLMs Really Think Step-by-step In Implicit Reasoning?
by: Yu, Yijiong
Published: (2024)

ToxiLab: How Well Do Open-Source LLMs Generate Synthetic Toxicity Data?
by: Hui, Zheng, et al.
Published: (2024)

CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
by: Cui, Hao, et al.
Published: (2025)

Generative AI Security: Challenges and Countermeasures
by: Zhu, Banghua, et al.
Published: (2024)

How Reliable are LLMs for Reasoning on the Re-ranking task?
by: Islam, Nafis Tanveer, et al.
Published: (2025)

Efficient Prompt Caching via Embedding Similarity
by: Zhu, Hanlin, et al.
Published: (2024)

CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs
by: Wang, Kangsheng, et al.
Published: (2024)

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
by: Chen, Zhi, et al.
Published: (2024)

RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA
by: Yang, Ruiyi, et al.
Published: (2025)

KG-Reasoner: A Reinforced Model for End-to-End Multi-Hop Knowledge Graph Reasoning
by: Wang, Shuai, et al.
Published: (2026)

HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
by: Yen, Howard, et al.
Published: (2024)