Saved in:
| Main Authors: | Zhang, Terry Jingchen, Dev, Gopal, Wang, Ning, Obreiter, Max, Pandey, Punya Syon, Samway, Keenan, Jiang, Wenyuan, Huang, Yinya, Schölkopf, Bernhard, Sachan, Mrinmaya, Jin, Zhijing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.00072 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Preserving Historical Truth: Detecting Historical Revisionism in Large Language Models
by: Ortu, Francesco, et al.
Published: (2026)
by: Ortu, Francesco, et al.
Published: (2026)
Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries
by: Ceraolo, Roberto, et al.
Published: (2024)
by: Ceraolo, Roberto, et al.
Published: (2024)
BinaryPPO: Efficient Policy Optimization for Binary Classification
by: Pandey, Punya Syon, et al.
Published: (2026)
by: Pandey, Punya Syon, et al.
Published: (2026)
Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
by: Harrasse, Abir, et al.
Published: (2025)
by: Harrasse, Abir, et al.
Published: (2025)
Improving Large Language Model Safety with Contrastive Representation Learning
by: Simko, Samuel, et al.
Published: (2025)
by: Simko, Samuel, et al.
Published: (2025)
Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification
by: He, Paul, et al.
Published: (2026)
by: He, Paul, et al.
Published: (2026)
CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs
by: Draye, Florent, et al.
Published: (2026)
by: Draye, Florent, et al.
Published: (2026)
Are Language Models Consequentialist or Deontological Moral Reasoners?
by: Samway, Keenan, et al.
Published: (2025)
by: Samway, Keenan, et al.
Published: (2025)
Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
by: Pandey, Punya Syon, et al.
Published: (2025)
by: Pandey, Punya Syon, et al.
Published: (2025)
CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures
by: Pandey, Punya Syon, et al.
Published: (2025)
by: Pandey, Punya Syon, et al.
Published: (2025)
Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs
by: Zhang, Terry Jingchen, et al.
Published: (2025)
by: Zhang, Terry Jingchen, et al.
Published: (2025)
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
by: Piatti, Giorgio, et al.
Published: (2024)
by: Piatti, Giorgio, et al.
Published: (2024)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis
by: Jenny, David F., et al.
Published: (2023)
by: Jenny, David F., et al.
Published: (2023)
Stargazer: A Scalable Model-Fitting Benchmark Environment for AI Agents under Astrophysical Constraints
by: Liu, Xinge, et al.
Published: (2026)
by: Liu, Xinge, et al.
Published: (2026)
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals
by: Ortu, Francesco, et al.
Published: (2024)
by: Ortu, Francesco, et al.
Published: (2024)
Do LLMs Think Fast and Slow? A Causal Study on Sentiment Analysis
by: Lyu, Zhiheng, et al.
Published: (2024)
by: Lyu, Zhiheng, et al.
Published: (2024)
When Do Language Models Endorse Limitations on Human Rights Principles?
by: Samway, Keenan, et al.
Published: (2026)
by: Samway, Keenan, et al.
Published: (2026)
Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games
by: Piedrahita, David Guzman, et al.
Published: (2025)
by: Piedrahita, David Guzman, et al.
Published: (2025)
SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
by: Pandey, Punya Syon, et al.
Published: (2025)
by: Pandey, Punya Syon, et al.
Published: (2025)
CausalCite: A Causal Formulation of Paper Citations
by: Kumar, Ishan, et al.
Published: (2023)
by: Kumar, Ishan, et al.
Published: (2023)
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
by: Opedal, Andreas, et al.
Published: (2024)
by: Opedal, Andreas, et al.
Published: (2024)
Can Large Language Models Infer Causation from Correlation?
by: Jin, Zhijing, et al.
Published: (2023)
by: Jin, Zhijing, et al.
Published: (2023)
Implicit Personalization in Language Models: A Systematic Study
by: Jin, Zhijing, et al.
Published: (2024)
by: Jin, Zhijing, et al.
Published: (2024)
Can Theoretical Physics Research Benefit from Language Agents?
by: Lu, Sirui, et al.
Published: (2025)
by: Lu, Sirui, et al.
Published: (2025)
Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting
by: Chowdhury, Sankalan Pal, et al.
Published: (2025)
by: Chowdhury, Sankalan Pal, et al.
Published: (2025)
Learning to Reason Efficiently with A* Post-Training
by: Opedal, Andreas, et al.
Published: (2026)
by: Opedal, Andreas, et al.
Published: (2026)
TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering
by: Hossain, Saad, et al.
Published: (2026)
by: Hossain, Saad, et al.
Published: (2026)
Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift
by: Vennemeyer, Daniel, et al.
Published: (2026)
by: Vennemeyer, Daniel, et al.
Published: (2026)
Fluid Representations in Reasoning Models
by: Kharlapenko, Dmitrii, et al.
Published: (2026)
by: Kharlapenko, Dmitrii, et al.
Published: (2026)
The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning
by: Cui, Shaobo, et al.
Published: (2024)
by: Cui, Shaobo, et al.
Published: (2024)
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
by: Xiang, Kun, et al.
Published: (2025)
by: Xiang, Kun, et al.
Published: (2025)
CLadder: Assessing Causal Reasoning in Language Models
by: Jin, Zhijing, et al.
Published: (2023)
by: Jin, Zhijing, et al.
Published: (2023)
Language Model Alignment in Multilingual Trolley Problems
by: Jin, Zhijing, et al.
Published: (2024)
by: Jin, Zhijing, et al.
Published: (2024)
Causality can systematically address the monsters under the bench(marks)
by: Leeb, Felix, et al.
Published: (2025)
by: Leeb, Felix, et al.
Published: (2025)
Causal Responsibility Attribution for Human-AI Collaboration
by: Qi, Yahang, et al.
Published: (2024)
by: Qi, Yahang, et al.
Published: (2024)
Investigating the Zone of Proximal Development of Language Models for In-Context Learning
by: Cui, Peng, et al.
Published: (2025)
by: Cui, Peng, et al.
Published: (2025)
Autoformalizing Natural Language to First-Order Logic: A Case Study in Logical Fallacy Detection
by: Lalwani, Abhinav, et al.
Published: (2024)
by: Lalwani, Abhinav, et al.
Published: (2024)
Are Language Models Efficient Reasoners? A Perspective from Logic Programming
by: Opedal, Andreas, et al.
Published: (2025)
by: Opedal, Andreas, et al.
Published: (2025)
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
by: Opedal, Andreas, et al.
Published: (2024)
by: Opedal, Andreas, et al.
Published: (2024)
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
by: Kassem, Aly M., et al.
Published: (2025)
by: Kassem, Aly M., et al.
Published: (2025)
Similar Items
-
Preserving Historical Truth: Detecting Historical Revisionism in Large Language Models
by: Ortu, Francesco, et al.
Published: (2026) -
Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries
by: Ceraolo, Roberto, et al.
Published: (2024) -
BinaryPPO: Efficient Policy Optimization for Binary Classification
by: Pandey, Punya Syon, et al.
Published: (2026) -
Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
by: Harrasse, Abir, et al.
Published: (2025) -
Improving Large Language Model Safety with Contrastive Representation Learning
by: Simko, Samuel, et al.
Published: (2025)