:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ku, Alexander Y., Griffiths, Thomas L., Chan, Stephanie C. Y.
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2505.09855
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the generalization of language models from in-context learning and finetuning: a controlled study
by: Lampinen, Andrew K., et al.
Published: (2025)

Uncovering Competency Gaps in Large Language Models and Their Benchmarks
by: Bohacek, Maty, et al.
Published: (2025)

Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems
by: Geng, Jiayi, et al.
Published: (2025)

Language models show human-like content effects on reasoning tasks
by: Dasgupta, Ishita, et al.
Published: (2022)

How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?
by: Liu, Ryan, et al.
Published: (2024)

Large Language Models Assume People are More Rational than We Really are
by: Liu, Ryan, et al.
Published: (2024)

Rational Metareasoning for Large Language Models
by: De Sabbata, C. Nicolò, et al.
Published: (2024)

RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
by: Liang, Kaiqu, et al.
Published: (2025)

Emergent Semantic Role Understanding in Language Models
by: Griffiths, Carla, et al.
Published: (2026)

Cognitive Architectures for Language Agents
by: Sumers, Theodore R., et al.
Published: (2023)

Are Large Language Models Sensitive to the Motives Behind Communication?
by: Wu, Addison J., et al.
Published: (2025)

A density estimation perspective on learning from pairwise human preferences
by: Dumoulin, Vincent, et al.
Published: (2023)

Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
by: Liang, Kaiqu, et al.
Published: (2025)

Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making
by: Hsu, Aliyah R., et al.
Published: (2023)

Hallucination Detection in LLMs: Fast and Memory-Efficient Fine-Tuned Models
by: Arteaga, Gabriel Y., et al.
Published: (2024)

What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
by: Zhang, Liyi, et al.
Published: (2024)

Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
by: Hsu, Aliyah R., et al.
Published: (2024)

Analyzing the Roles of Language and Vision in Learning from Limited Data
by: Chen, Allison, et al.
Published: (2024)

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
by: Liu, Ryan, et al.
Published: (2024)

Bringing Up a Bilingual BabyLM: Investigating Multilingual Language Acquisition Using Small-Scale Models
by: Zeng, Linda, et al.
Published: (2026)

Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
by: Zhang, Liyi, et al.
Published: (2025)

Transforming Agency. On the mode of existence of Large Language Models
by: Barandiaran, Xabier E., et al.
Published: (2024)

Is Child-Directed Speech Effective Training Data for Language Models?
by: Feng, Steven Y., et al.
Published: (2024)

Baby Scale: Investigating Models Trained on Individual Children's Language Input
by: Feng, Steven Y., et al.
Published: (2026)

On the Ability of Transformers to Verify Plans
by: Sarrof, Yash, et al.
Published: (2026)

Does Transformer Interpretability Transfer to RNNs?
by: Paulo, Gonçalo, et al.
Published: (2024)

STAT: Shrinking Transformers After Training
by: Flynn, Megan, et al.
Published: (2024)

Intelligent Learning Rate Distribution to reduce Catastrophic Forgetting in Transformers
by: Kenneweg, Philip, et al.
Published: (2024)

The Point of View of a Sentiment: Towards Clinician Bias Detection in Psychiatric Notes
by: Valentine, Alissa A., et al.
Published: (2024)

The Condensate Theorem: Transformers are O(n), Not $O(n^2)$
by: Williams, Jorge L. Ruiz
Published: (2026)

Low-rank finetuning for LLMs: A fairness perspective
by: Das, Saswat, et al.
Published: (2024)

Learning is Forgetting: LLM Training As Lossy Compression
by: Conklin, Henry C., et al.
Published: (2026)

Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment
by: Krishna, Kundan, et al.
Published: (2025)

Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts
by: Lee, Sang-Woo, et al.
Published: (2025)

Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting
by: Hu, Michael Y., et al.
Published: (2025)

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
by: Shao, Zhihong, et al.
Published: (2024)

Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information
by: Huang, Xin, et al.
Published: (2026)

TPTT: Transforming Pretrained Transformers into Titans
by: Furfaro, Fabien
Published: (2025)

The broader spectrum of in-context learning
by: Lampinen, Andrew Kyle, et al.
Published: (2024)

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
by: Hu, Michael Y., et al.
Published: (2025)