:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rmus, Milena, Hardy, Mathew D., Griffiths, Thomas L., Agrawal, Mayank
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.06524
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1
by: McCoy, R. Thomas, et al.
Published: (2024)

Can we automatize scientific discovery in the cognitive sciences?
by: Jagadish, Akshay K., et al.
Published: (2026)

How Good Are LLMs at Processing Tool Outputs?
by: Kate, Kiran, et al.
Published: (2025)

Learning Human-like Representations to Enable Learning Human Values
by: Wynn, Andrea, et al.
Published: (2023)

On Benchmarking Human-Like Intelligence in Machines
by: Ying, Lance, et al.
Published: (2025)

Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks
by: Chandra, Abhranil, et al.
Published: (2025)

Why Human Guidance Matters in Collaborative Vibe Coding
by: Hu, Haoyu, et al.
Published: (2026)

Parallelograms Strike Back: LLMs Generate Better Analogies than People
by: Liu, Qiawen Ella, et al.
Published: (2026)

"All that Glitters": Approaches to Evaluations with Unreliable Model and Human Annotations
by: Hardy, Michael
Published: (2024)

Human-Like Geometric Abstraction in Large Pre-trained Neural Networks
by: Campbell, Declan, et al.
Published: (2024)

Large Language Models Assume People are More Rational than We Really are
by: Liu, Ryan, et al.
Published: (2024)

Generating Novelty in Open-World Multi-Agent Strategic Board Games
by: Kejriwal, Mayank, et al.
Published: (2025)

Task--Specificity Score: Measuring How Much Instructions Really Matter for Supervision
by: Kadasi, Pritam, et al.
Published: (2026)

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
by: Zhu, Jian-Qiao, et al.
Published: (2024)

Toward Efficient Exploration by Large Language Model Agents
by: Arumugam, Dilip, et al.
Published: (2025)

Mixer is more than just a model
by: Ji, Qingfeng, et al.
Published: (2024)

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games
by: Ying, Lance, et al.
Published: (2026)

Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity
by: Liu, Qiawen Ella, et al.
Published: (2026)

Reversing the Paradigm: Building AI-First Systems with Human Guidance
by: Spera, Cosimo, et al.
Published: (2025)

Conformal Prediction as Bayesian Quadrature
by: Snell, Jake C., et al.
Published: (2025)

Incoherent Probability Judgments in Large Language Models
by: Zhu, Jian-Qiao, et al.
Published: (2024)

More than Marketing? On the Information Value of AI Benchmarks for Practitioners
by: Hardy, Amelia, et al.
Published: (2024)

Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions
by: Zhu, Jian-Qiao, et al.
Published: (2025)

Fine-Tuning and Serving Gemma 4 31B on Google Cloud TPU: A Technical Comparison with GPU Baselines
by: Kishnani, Jatin, et al.
Published: (2026)

Investigating Concept Alignment Using Implausible Category Members
by: Rane, Sunayana, et al.
Published: (2026)

Program-Based Strategy Induction for Reinforcement Learning
by: Correa, Carlos G., et al.
Published: (2024)

Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process
by: Watanabe, Shuhei
Published: (2025)

Recovering Event Probabilities from Large Language Model Embeddings via Axiomatic Constraints
by: Zhu, Jian-Qiao, et al.
Published: (2025)

Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo
by: Zhu, Jian-Qiao, et al.
Published: (2024)

Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments
by: Strugatski, Alona, et al.
Published: (2024)

Partner Modelling Emerges in Recurrent Agents (But Only When It Matters)
by: Mon-Williams, Ruaridh, et al.
Published: (2025)

Instruction Fine-Tuning: Does Prompt Loss Matter?
by: Huerta-Enochian, Mathew, et al.
Published: (2024)

Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss
by: Xu, Jing, et al.
Published: (2023)

Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning
by: Prabhakar, Akshara, et al.
Published: (2024)

Distilling Symbolic Priors for Concept Learning into Neural Networks
by: Marinescu, Ioana, et al.
Published: (2024)

Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
by: Liang, Kaiqu, et al.
Published: (2025)

Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations
by: Jha, Mayank
Published: (2026)

High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination
by: Maini, Sahaj Singh, et al.
Published: (2026)

CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
by: Ravishankara, Mayank
Published: (2026)

PlotChain: Deterministic Checkpointed Evaluation of Multimodal LLMs on Engineering Plot Reading
by: Ravishankara, Mayank
Published: (2026)