Saved in:
| Main Authors: | Kharlapenko, Dmitrii, Stolfo, Alessandro, Conmy, Arthur, Sachan, Mrinmaya, Jin, Zhijing |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.04843 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Probing for Arithmetic Errors in Language Models
by: Sun, Yucheng, et al.
Published: (2025)
by: Sun, Yucheng, et al.
Published: (2025)
Scaling sparse feature circuit finding for in-context learning
by: Kharlapenko, Dmitrii, et al.
Published: (2025)
by: Kharlapenko, Dmitrii, et al.
Published: (2025)
Improving Large Language Model Safety with Contrastive Representation Learning
by: Simko, Samuel, et al.
Published: (2025)
by: Simko, Samuel, et al.
Published: (2025)
Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification
by: He, Paul, et al.
Published: (2026)
by: He, Paul, et al.
Published: (2026)
Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries
by: Ceraolo, Roberto, et al.
Published: (2024)
by: Ceraolo, Roberto, et al.
Published: (2024)
On the Emergence of Induction Heads for In-Context Learning
by: Musat, Tiberiu, et al.
Published: (2025)
by: Musat, Tiberiu, et al.
Published: (2025)
Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games
by: Piedrahita, David Guzman, et al.
Published: (2025)
by: Piedrahita, David Guzman, et al.
Published: (2025)
Confidence Regulation Neurons in Language Models
by: Stolfo, Alessandro, et al.
Published: (2024)
by: Stolfo, Alessandro, et al.
Published: (2024)
Dense SAE Latents Are Features, Not Bugs
by: Sun, Xiaoqing, et al.
Published: (2025)
by: Sun, Xiaoqing, et al.
Published: (2025)
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
by: Opedal, Andreas, et al.
Published: (2024)
by: Opedal, Andreas, et al.
Published: (2024)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis
by: Jenny, David F., et al.
Published: (2023)
by: Jenny, David F., et al.
Published: (2023)
Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?
by: Tarasov, Denis, et al.
Published: (2024)
by: Tarasov, Denis, et al.
Published: (2024)
Autoformalizing Natural Language to First-Order Logic: A Case Study in Logical Fallacy Detection
by: Lalwani, Abhinav, et al.
Published: (2024)
by: Lalwani, Abhinav, et al.
Published: (2024)
Can Large Language Models Infer Causation from Correlation?
by: Jin, Zhijing, et al.
Published: (2023)
by: Jin, Zhijing, et al.
Published: (2023)
CLadder: Assessing Causal Reasoning in Language Models
by: Jin, Zhijing, et al.
Published: (2023)
by: Jin, Zhijing, et al.
Published: (2023)
CausalCite: A Causal Formulation of Paper Citations
by: Kumar, Ishan, et al.
Published: (2023)
by: Kumar, Ishan, et al.
Published: (2023)
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors
by: Daheim, Nico, et al.
Published: (2024)
by: Daheim, Nico, et al.
Published: (2024)
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
by: Wang, Yucheng, et al.
Published: (2025)
by: Wang, Yucheng, et al.
Published: (2025)
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks
by: Liu, Rongxing, et al.
Published: (2024)
by: Liu, Rongxing, et al.
Published: (2024)
Implicit Personalization in Language Models: A Systematic Study
by: Jin, Zhijing, et al.
Published: (2024)
by: Jin, Zhijing, et al.
Published: (2024)
Base Models Know How to Reason, Thinking Models Learn When
by: Venhoff, Constantin, et al.
Published: (2025)
by: Venhoff, Constantin, et al.
Published: (2025)
Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing
by: Ozyurt, Yilmazcan, et al.
Published: (2025)
by: Ozyurt, Yilmazcan, et al.
Published: (2025)
Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators
by: Do, Heejin, et al.
Published: (2026)
by: Do, Heejin, et al.
Published: (2026)
Variational Classification
by: Dhuliawala, Shehzaad, et al.
Published: (2023)
by: Dhuliawala, Shehzaad, et al.
Published: (2023)
Automatically Finding Reward Model Biases
by: Wang, Atticus, et al.
Published: (2026)
by: Wang, Atticus, et al.
Published: (2026)
Understanding Reasoning in Thinking Language Models via Steering Vectors
by: Venhoff, Constantin, et al.
Published: (2025)
by: Venhoff, Constantin, et al.
Published: (2025)
Line of Sight: On Linear Representations in VLLMs
by: Rajaram, Achyuta, et al.
Published: (2025)
by: Rajaram, Achyuta, et al.
Published: (2025)
How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading
by: Cui, Peng, et al.
Published: (2024)
by: Cui, Peng, et al.
Published: (2024)
Towards Aligning Language Models with Textual Feedback
by: Lloret, Saüc Abadal, et al.
Published: (2024)
by: Lloret, Saüc Abadal, et al.
Published: (2024)
Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning
by: Shabalin, Stepan, et al.
Published: (2025)
by: Shabalin, Stepan, et al.
Published: (2025)
Can LLMs Model Incorrect Student Reasoning? A Case Study on Distractor Generation
by: Zengaffinen, Yanick, et al.
Published: (2026)
by: Zengaffinen, Yanick, et al.
Published: (2026)
Benchmarking and Enhancing Text-to-Image Models for Generating Visual Representations in Early Arithmetic Education
by: Wang, Junling, et al.
Published: (2026)
by: Wang, Junling, et al.
Published: (2026)
Learning to Reason Efficiently with A* Post-Training
by: Opedal, Andreas, et al.
Published: (2026)
by: Opedal, Andreas, et al.
Published: (2026)
Thought Anchors: Which LLM Reasoning Steps Matter?
by: Bogdan, Paul C., et al.
Published: (2025)
by: Bogdan, Paul C., et al.
Published: (2025)
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models
by: Wang, Junling, et al.
Published: (2025)
by: Wang, Junling, et al.
Published: (2025)
Can Vision-Language Models Solve Visual Math Equations?
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)
SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning
by: Adarsh, Shivam, et al.
Published: (2024)
by: Adarsh, Shivam, et al.
Published: (2024)
Multilingual Performance Biases of Large Language Models in Education
by: Gupta, Vansh, et al.
Published: (2025)
by: Gupta, Vansh, et al.
Published: (2025)
Test of Time: Rethinking Temporal Signal of Benchmark Contamination
by: Zhang, Terry Jingchen, et al.
Published: (2025)
by: Zhang, Terry Jingchen, et al.
Published: (2025)
Post-Training Language Models for Crosslingual Consistency
by: Liu, Tianyu, et al.
Published: (2026)
by: Liu, Tianyu, et al.
Published: (2026)
Similar Items
-
Probing for Arithmetic Errors in Language Models
by: Sun, Yucheng, et al.
Published: (2025) -
Scaling sparse feature circuit finding for in-context learning
by: Kharlapenko, Dmitrii, et al.
Published: (2025) -
Improving Large Language Model Safety with Contrastive Representation Learning
by: Simko, Samuel, et al.
Published: (2025) -
Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification
by: He, Paul, et al.
Published: (2026) -
Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries
by: Ceraolo, Roberto, et al.
Published: (2024)