Saved in:
| Main Authors: | Rmus, Milena, Hardy, Mathew D., Griffiths, Thomas L., Agrawal, Mayank |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.06524 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1
by: McCoy, R. Thomas, et al.
Published: (2024)
by: McCoy, R. Thomas, et al.
Published: (2024)
Can we automatize scientific discovery in the cognitive sciences?
by: Jagadish, Akshay K., et al.
Published: (2026)
by: Jagadish, Akshay K., et al.
Published: (2026)
How Good Are LLMs at Processing Tool Outputs?
by: Kate, Kiran, et al.
Published: (2025)
by: Kate, Kiran, et al.
Published: (2025)
Learning Human-like Representations to Enable Learning Human Values
by: Wynn, Andrea, et al.
Published: (2023)
by: Wynn, Andrea, et al.
Published: (2023)
On Benchmarking Human-Like Intelligence in Machines
by: Ying, Lance, et al.
Published: (2025)
by: Ying, Lance, et al.
Published: (2025)
Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks
by: Chandra, Abhranil, et al.
Published: (2025)
by: Chandra, Abhranil, et al.
Published: (2025)
Why Human Guidance Matters in Collaborative Vibe Coding
by: Hu, Haoyu, et al.
Published: (2026)
by: Hu, Haoyu, et al.
Published: (2026)
Parallelograms Strike Back: LLMs Generate Better Analogies than People
by: Liu, Qiawen Ella, et al.
Published: (2026)
by: Liu, Qiawen Ella, et al.
Published: (2026)
"All that Glitters": Approaches to Evaluations with Unreliable Model and Human Annotations
by: Hardy, Michael
Published: (2024)
by: Hardy, Michael
Published: (2024)
Human-Like Geometric Abstraction in Large Pre-trained Neural Networks
by: Campbell, Declan, et al.
Published: (2024)
by: Campbell, Declan, et al.
Published: (2024)
Large Language Models Assume People are More Rational than We Really are
by: Liu, Ryan, et al.
Published: (2024)
by: Liu, Ryan, et al.
Published: (2024)
Generating Novelty in Open-World Multi-Agent Strategic Board Games
by: Kejriwal, Mayank, et al.
Published: (2025)
by: Kejriwal, Mayank, et al.
Published: (2025)
Task--Specificity Score: Measuring How Much Instructions Really Matter for Supervision
by: Kadasi, Pritam, et al.
Published: (2026)
by: Kadasi, Pritam, et al.
Published: (2026)
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
by: Zhu, Jian-Qiao, et al.
Published: (2024)
by: Zhu, Jian-Qiao, et al.
Published: (2024)
Toward Efficient Exploration by Large Language Model Agents
by: Arumugam, Dilip, et al.
Published: (2025)
by: Arumugam, Dilip, et al.
Published: (2025)
Mixer is more than just a model
by: Ji, Qingfeng, et al.
Published: (2024)
by: Ji, Qingfeng, et al.
Published: (2024)
AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games
by: Ying, Lance, et al.
Published: (2026)
by: Ying, Lance, et al.
Published: (2026)
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity
by: Liu, Qiawen Ella, et al.
Published: (2026)
by: Liu, Qiawen Ella, et al.
Published: (2026)
Reversing the Paradigm: Building AI-First Systems with Human Guidance
by: Spera, Cosimo, et al.
Published: (2025)
by: Spera, Cosimo, et al.
Published: (2025)
Conformal Prediction as Bayesian Quadrature
by: Snell, Jake C., et al.
Published: (2025)
by: Snell, Jake C., et al.
Published: (2025)
Incoherent Probability Judgments in Large Language Models
by: Zhu, Jian-Qiao, et al.
Published: (2024)
by: Zhu, Jian-Qiao, et al.
Published: (2024)
More than Marketing? On the Information Value of AI Benchmarks for Practitioners
by: Hardy, Amelia, et al.
Published: (2024)
by: Hardy, Amelia, et al.
Published: (2024)
Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions
by: Zhu, Jian-Qiao, et al.
Published: (2025)
by: Zhu, Jian-Qiao, et al.
Published: (2025)
Fine-Tuning and Serving Gemma 4 31B on Google Cloud TPU: A Technical Comparison with GPU Baselines
by: Kishnani, Jatin, et al.
Published: (2026)
by: Kishnani, Jatin, et al.
Published: (2026)
Investigating Concept Alignment Using Implausible Category Members
by: Rane, Sunayana, et al.
Published: (2026)
by: Rane, Sunayana, et al.
Published: (2026)
Program-Based Strategy Induction for Reinforcement Learning
by: Correa, Carlos G., et al.
Published: (2024)
by: Correa, Carlos G., et al.
Published: (2024)
Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process
by: Watanabe, Shuhei
Published: (2025)
by: Watanabe, Shuhei
Published: (2025)
Recovering Event Probabilities from Large Language Model Embeddings via Axiomatic Constraints
by: Zhu, Jian-Qiao, et al.
Published: (2025)
by: Zhu, Jian-Qiao, et al.
Published: (2025)
Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo
by: Zhu, Jian-Qiao, et al.
Published: (2024)
by: Zhu, Jian-Qiao, et al.
Published: (2024)
Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments
by: Strugatski, Alona, et al.
Published: (2024)
by: Strugatski, Alona, et al.
Published: (2024)
Partner Modelling Emerges in Recurrent Agents (But Only When It Matters)
by: Mon-Williams, Ruaridh, et al.
Published: (2025)
by: Mon-Williams, Ruaridh, et al.
Published: (2025)
Instruction Fine-Tuning: Does Prompt Loss Matter?
by: Huerta-Enochian, Mathew, et al.
Published: (2024)
by: Huerta-Enochian, Mathew, et al.
Published: (2024)
Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss
by: Xu, Jing, et al.
Published: (2023)
by: Xu, Jing, et al.
Published: (2023)
Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning
by: Prabhakar, Akshara, et al.
Published: (2024)
by: Prabhakar, Akshara, et al.
Published: (2024)
Distilling Symbolic Priors for Concept Learning into Neural Networks
by: Marinescu, Ioana, et al.
Published: (2024)
by: Marinescu, Ioana, et al.
Published: (2024)
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
by: Liang, Kaiqu, et al.
Published: (2025)
by: Liang, Kaiqu, et al.
Published: (2025)
Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations
by: Jha, Mayank
Published: (2026)
by: Jha, Mayank
Published: (2026)
High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination
by: Maini, Sahaj Singh, et al.
Published: (2026)
by: Maini, Sahaj Singh, et al.
Published: (2026)
CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
by: Ravishankara, Mayank
Published: (2026)
by: Ravishankara, Mayank
Published: (2026)
PlotChain: Deterministic Checkpointed Evaluation of Multimodal LLMs on Engineering Plot Reading
by: Ravishankara, Mayank
Published: (2026)
by: Ravishankara, Mayank
Published: (2026)
Similar Items
-
When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1
by: McCoy, R. Thomas, et al.
Published: (2024) -
Can we automatize scientific discovery in the cognitive sciences?
by: Jagadish, Akshay K., et al.
Published: (2026) -
How Good Are LLMs at Processing Tool Outputs?
by: Kate, Kiran, et al.
Published: (2025) -
Learning Human-like Representations to Enable Learning Human Values
by: Wynn, Andrea, et al.
Published: (2023) -
On Benchmarking Human-Like Intelligence in Machines
by: Ying, Lance, et al.
Published: (2025)