:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Terry Jingchen, Dev, Gopal, Wang, Ning, Obreiter, Max, Pandey, Punya Syon, Samway, Keenan, Jiang, Wenyuan, Huang, Yinya, Schölkopf, Bernhard, Sachan, Mrinmaya, Jin, Zhijing
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.00072
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Preserving Historical Truth: Detecting Historical Revisionism in Large Language Models
by: Ortu, Francesco, et al.
Published: (2026)

Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries
by: Ceraolo, Roberto, et al.
Published: (2024)

BinaryPPO: Efficient Policy Optimization for Binary Classification
by: Pandey, Punya Syon, et al.
Published: (2026)

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
by: Harrasse, Abir, et al.
Published: (2025)

Improving Large Language Model Safety with Contrastive Representation Learning
by: Simko, Samuel, et al.
Published: (2025)

Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification
by: He, Paul, et al.
Published: (2026)

CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs
by: Draye, Florent, et al.
Published: (2026)

Are Language Models Consequentialist or Deontological Moral Reasoners?
by: Samway, Keenan, et al.
Published: (2025)

Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
by: Pandey, Punya Syon, et al.
Published: (2025)

CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures
by: Pandey, Punya Syon, et al.
Published: (2025)

Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs
by: Zhang, Terry Jingchen, et al.
Published: (2025)

Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
by: Piatti, Giorgio, et al.
Published: (2024)

Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis
by: Jenny, David F., et al.
Published: (2023)

Stargazer: A Scalable Model-Fitting Benchmark Environment for AI Agents under Astrophysical Constraints
by: Liu, Xinge, et al.
Published: (2026)

Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals
by: Ortu, Francesco, et al.
Published: (2024)

Do LLMs Think Fast and Slow? A Causal Study on Sentiment Analysis
by: Lyu, Zhiheng, et al.
Published: (2024)

When Do Language Models Endorse Limitations on Human Rights Principles?
by: Samway, Keenan, et al.
Published: (2026)

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games
by: Piedrahita, David Guzman, et al.
Published: (2025)

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
by: Pandey, Punya Syon, et al.
Published: (2025)

CausalCite: A Causal Formulation of Paper Citations
by: Kumar, Ishan, et al.
Published: (2023)

MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
by: Opedal, Andreas, et al.
Published: (2024)

Can Large Language Models Infer Causation from Correlation?
by: Jin, Zhijing, et al.
Published: (2023)

Implicit Personalization in Language Models: A Systematic Study
by: Jin, Zhijing, et al.
Published: (2024)

Can Theoretical Physics Research Benefit from Language Agents?
by: Lu, Sirui, et al.
Published: (2025)

Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting
by: Chowdhury, Sankalan Pal, et al.
Published: (2025)

Learning to Reason Efficiently with A* Post-Training
by: Opedal, Andreas, et al.
Published: (2026)

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering
by: Hossain, Saad, et al.
Published: (2026)

Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift
by: Vennemeyer, Daniel, et al.
Published: (2026)

Fluid Representations in Reasoning Models
by: Kharlapenko, Dmitrii, et al.
Published: (2026)

The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning
by: Cui, Shaobo, et al.
Published: (2024)

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
by: Xiang, Kun, et al.
Published: (2025)

CLadder: Assessing Causal Reasoning in Language Models
by: Jin, Zhijing, et al.
Published: (2023)

Language Model Alignment in Multilingual Trolley Problems
by: Jin, Zhijing, et al.
Published: (2024)

Causality can systematically address the monsters under the bench(marks)
by: Leeb, Felix, et al.
Published: (2025)

Causal Responsibility Attribution for Human-AI Collaboration
by: Qi, Yahang, et al.
Published: (2024)

Investigating the Zone of Proximal Development of Language Models for In-Context Learning
by: Cui, Peng, et al.
Published: (2025)

Autoformalizing Natural Language to First-Order Logic: A Case Study in Logical Fallacy Detection
by: Lalwani, Abhinav, et al.
Published: (2024)

Are Language Models Efficient Reasoners? A Perspective from Logic Programming
by: Opedal, Andreas, et al.
Published: (2025)

Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
by: Opedal, Andreas, et al.
Published: (2024)

How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
by: Kassem, Aly M., et al.
Published: (2025)