Saved in:
| Main Authors: | Guo, Tianyu, Zhu, Hanlin, Zhang, Ruiqi, Jiao, Jiantao, Mei, Song, Jordan, Michael I., Russell, Stuart |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.13913 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
by: Huang, Yixiao, et al.
Published: (2025)
by: Huang, Yixiao, et al.
Published: (2025)
GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
by: Zhu, Hanlin, et al.
Published: (2025)
by: Zhu, Hanlin, et al.
Published: (2025)
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
by: Zhu, Banghua, et al.
Published: (2024)
by: Zhu, Banghua, et al.
Published: (2024)
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025)
by: Su, DiJia, et al.
Published: (2025)
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis
by: Li, Hanyu, et al.
Published: (2025)
by: Li, Hanyu, et al.
Published: (2025)
Lessons from Studying Two-Hop Latent Reasoning
by: Balesni, Mikita, et al.
Published: (2024)
by: Balesni, Mikita, et al.
Published: (2024)
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
by: Zhou, Yang, et al.
Published: (2025)
by: Zhou, Yang, et al.
Published: (2025)
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
by: Zhu, Hanlin, et al.
Published: (2024)
by: Zhu, Hanlin, et al.
Published: (2024)
From Query to Logic: Ontology-Driven Multi-Hop Reasoning in LLMs
by: Bian, Haonan, et al.
Published: (2025)
by: Bian, Haonan, et al.
Published: (2025)
Thinking LLMs: General Instruction Following with Thought Generation
by: Wu, Tianhao, et al.
Published: (2024)
by: Wu, Tianhao, et al.
Published: (2024)
How Likely Do LLMs with CoT Mimic Human Reasoning?
by: Bao, Guangsheng, et al.
Published: (2024)
by: Bao, Guangsheng, et al.
Published: (2024)
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons
by: Zhu, Banghua, et al.
Published: (2023)
by: Zhu, Banghua, et al.
Published: (2023)
How Do LLMs Use Their Depth?
by: Gupta, Akshat, et al.
Published: (2025)
by: Gupta, Akshat, et al.
Published: (2025)
How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning
by: Chen, Haoyang, et al.
Published: (2026)
by: Chen, Haoyang, et al.
Published: (2026)
Is Depth All You Need? An Exploration of Iterative Reasoning in LLMs
by: Wu, Zongqian, et al.
Published: (2025)
by: Wu, Zongqian, et al.
Published: (2025)
Reasoning Court: Combining Reasoning, Action, and Judgment for Multi-Hop Reasoning
by: Wu, Jingtian, et al.
Published: (2025)
by: Wu, Jingtian, et al.
Published: (2025)
Under the Shadow of Babel: How Language Shapes Reasoning in LLMs
by: Wang, Chenxi, et al.
Published: (2025)
by: Wang, Chenxi, et al.
Published: (2025)
What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context for Multi-Hop QA
by: Chang, Zhiyuan, et al.
Published: (2024)
by: Chang, Zhiyuan, et al.
Published: (2024)
Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis
by: Yadav, Anushka, et al.
Published: (2025)
by: Yadav, Anushka, et al.
Published: (2025)
Avoiding Catastrophe in Online Learning by Asking for Help
by: Plaut, Benjamin, et al.
Published: (2024)
by: Plaut, Benjamin, et al.
Published: (2024)
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
by: Wang, Shenzhi, et al.
Published: (2026)
by: Wang, Shenzhi, et al.
Published: (2026)
How Well Do LLMs Understand Tunisian Arabic?
by: Mahdi, Mohamed
Published: (2025)
by: Mahdi, Mohamed
Published: (2025)
How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?
by: Cui, Yingqian, et al.
Published: (2026)
by: Cui, Yingqian, et al.
Published: (2026)
Multi-Hop Reasoning for Question Answering with Hyperbolic Representations
by: Welz, Simon, et al.
Published: (2025)
by: Welz, Simon, et al.
Published: (2025)
Contextual Drag: How Errors in the Context Affect LLM Reasoning
by: Cheng, Yun, et al.
Published: (2026)
by: Cheng, Yun, et al.
Published: (2026)
Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?
by: Mei, Zhiting, et al.
Published: (2025)
by: Mei, Zhiting, et al.
Published: (2025)
How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)
by: Frick, Evan, et al.
Published: (2024)
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
by: Zhang, Ruiqi, et al.
Published: (2024)
by: Zhang, Ruiqi, et al.
Published: (2024)
LLMs for Relational Reasoning: How Far are We?
by: Li, Zhiming, et al.
Published: (2024)
by: Li, Zhiming, et al.
Published: (2024)
Do LLMs Really Think Step-by-step In Implicit Reasoning?
by: Yu, Yijiong
Published: (2024)
by: Yu, Yijiong
Published: (2024)
ToxiLab: How Well Do Open-Source LLMs Generate Synthetic Toxicity Data?
by: Hui, Zheng, et al.
Published: (2024)
by: Hui, Zheng, et al.
Published: (2024)
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
by: Cui, Hao, et al.
Published: (2025)
by: Cui, Hao, et al.
Published: (2025)
Generative AI Security: Challenges and Countermeasures
by: Zhu, Banghua, et al.
Published: (2024)
by: Zhu, Banghua, et al.
Published: (2024)
How Reliable are LLMs for Reasoning on the Re-ranking task?
by: Islam, Nafis Tanveer, et al.
Published: (2025)
by: Islam, Nafis Tanveer, et al.
Published: (2025)
Efficient Prompt Caching via Embedding Similarity
by: Zhu, Hanlin, et al.
Published: (2024)
by: Zhu, Hanlin, et al.
Published: (2024)
CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs
by: Wang, Kangsheng, et al.
Published: (2024)
by: Wang, Kangsheng, et al.
Published: (2024)
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
by: Chen, Zhi, et al.
Published: (2024)
by: Chen, Zhi, et al.
Published: (2024)
RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA
by: Yang, Ruiyi, et al.
Published: (2025)
by: Yang, Ruiyi, et al.
Published: (2025)
KG-Reasoner: A Reinforced Model for End-to-End Multi-Hop Knowledge Graph Reasoning
by: Wang, Shuai, et al.
Published: (2026)
by: Wang, Shuai, et al.
Published: (2026)
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
by: Yen, Howard, et al.
Published: (2024)
by: Yen, Howard, et al.
Published: (2024)
Similar Items
-
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
by: Huang, Yixiao, et al.
Published: (2025) -
GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments
by: Zhu, Hanlin, et al.
Published: (2025) -
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
by: Zhu, Banghua, et al.
Published: (2024) -
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025) -
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis
by: Li, Hanyu, et al.
Published: (2025)