Saved in:
| Main Authors: | Kim, Been, Hewitt, John, Nanda, Neel, Fiedel, Noah, Tafjord, Oyvind |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.12152 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Neologism Learning for Controllability and Self-Verbalization
by: Hewitt, John, et al.
Published: (2025)
by: Hewitt, John, et al.
Published: (2025)
We Can't Understand AI Using our Existing Vocabulary
by: Hewitt, John, et al.
Published: (2025)
by: Hewitt, John, et al.
Published: (2025)
Digital Socrates: Evaluating LLMs through Explanation Critiques
by: Gu, Yuling, et al.
Published: (2023)
by: Gu, Yuling, et al.
Published: (2023)
BaRDa: A Belief and Reasoning Dataset that Separates Factual Accuracy and Reasoning Ability
by: Clark, Peter, et al.
Published: (2023)
by: Clark, Peter, et al.
Published: (2023)
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs
by: Smit, Andries, et al.
Published: (2023)
by: Smit, Andries, et al.
Published: (2023)
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
by: Gu, Yuling, et al.
Published: (2024)
by: Gu, Yuling, et al.
Published: (2024)
Can we forget how we learned? Doxastic redundancy in iterated belief revision
by: Liberatore, Paolo
Published: (2024)
by: Liberatore, Paolo
Published: (2024)
Features have life history. And we should care
by: Stecher, Philipp, et al.
Published: (2026)
by: Stecher, Philipp, et al.
Published: (2026)
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
by: Wiegreffe, Sarah, et al.
Published: (2024)
by: Wiegreffe, Sarah, et al.
Published: (2024)
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
by: Li, Belinda Z., et al.
Published: (2025)
by: Li, Belinda Z., et al.
Published: (2025)
A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?
by: Liu, Xinyu, et al.
Published: (2024)
by: Liu, Xinyu, et al.
Published: (2024)
Can we Evaluate RAGs with Synthetic Data?
by: van Elburg, Jonas, et al.
Published: (2025)
by: van Elburg, Jonas, et al.
Published: (2025)
OLMES: A Standard for Language Model Evaluations
by: Gu, Yuling, et al.
Published: (2024)
by: Gu, Yuling, et al.
Published: (2024)
Thought Branches: Interpreting LLM Reasoning Requires Resampling
by: Macar, Uzay, et al.
Published: (2025)
by: Macar, Uzay, et al.
Published: (2025)
Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
by: Jiang, Nick, et al.
Published: (2025)
by: Jiang, Nick, et al.
Published: (2025)
When AI reviews science: Can we trust the referee?
by: Wang, Jialiang, et al.
Published: (2026)
by: Wang, Jialiang, et al.
Published: (2026)
Can we trust the evaluation on ChatGPT?
by: Aiyappa, Rachith, et al.
Published: (2023)
by: Aiyappa, Rachith, et al.
Published: (2023)
Explorations of Self-Repair in Language Models
by: Rushing, Cody, et al.
Published: (2024)
by: Rushing, Cody, et al.
Published: (2024)
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
by: Zhang, Fred, et al.
Published: (2023)
by: Zhang, Fred, et al.
Published: (2023)
Can we automatize scientific discovery in the cognitive sciences?
by: Jagadish, Akshay K., et al.
Published: (2026)
by: Jagadish, Akshay K., et al.
Published: (2026)
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
by: Minder, Julian, et al.
Published: (2025)
by: Minder, Julian, et al.
Published: (2025)
How Well Do Models Follow Their Constitutions?
by: Jakkli, Arya, et al.
Published: (2026)
by: Jakkli, Arya, et al.
Published: (2026)
LitLLMs, LLMs for Literature Review: Are we there yet?
by: Agarwal, Shubham, et al.
Published: (2024)
by: Agarwal, Shubham, et al.
Published: (2024)
Can we only use guideline instead of shot in prompt?
by: Chen, Jiaxiang, et al.
Published: (2024)
by: Chen, Jiaxiang, et al.
Published: (2024)
Because we're here
by: Susskind, Leonard
Published: (2005)
by: Susskind, Leonard
Published: (2005)
BatchTopK Sparse Autoencoders
by: Bussmann, Bart, et al.
Published: (2024)
by: Bussmann, Bart, et al.
Published: (2024)
What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
by: Maar, Jim, et al.
Published: (2026)
by: Maar, Jim, et al.
Published: (2026)
Guided Persona-based AI Surveys: Can we replicate personal mobility preferences at scale using LLMs?
by: Tzachristas, Ioannis, et al.
Published: (2025)
by: Tzachristas, Ioannis, et al.
Published: (2025)
Can we ease the Injectivity Bottleneck on Lorentzian Manifolds for Graph Neural Networks?
by: Srinivasan, Srinitish, et al.
Published: (2025)
by: Srinivasan, Srinitish, et al.
Published: (2025)
Good things come in small packages: Should we build AI clusters with Lite-GPUs?
by: Canakci, Burcu, et al.
Published: (2025)
by: Canakci, Burcu, et al.
Published: (2025)
Can we use LLMs to bootstrap reinforcement learning? -- A case study in digital health behavior change
by: Albers, Nele, et al.
Published: (2025)
by: Albers, Nele, et al.
Published: (2025)
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
by: Casademunt, Helena, et al.
Published: (2026)
by: Casademunt, Helena, et al.
Published: (2026)
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
by: Bussmann, Bart, et al.
Published: (2025)
by: Bussmann, Bart, et al.
Published: (2025)
Convergent Linear Representations of Emergent Misalignment
by: Soligo, Anna, et al.
Published: (2025)
by: Soligo, Anna, et al.
Published: (2025)
Emergent Misalignment is Easy, Narrow Misalignment is Hard
by: Soligo, Anna, et al.
Published: (2026)
by: Soligo, Anna, et al.
Published: (2026)
Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in
by: Agarwal, Utkarsh, et al.
Published: (2024)
by: Agarwal, Utkarsh, et al.
Published: (2024)
Subliminal Learning Is Steering Vector Distillation
by: Blank, Camila, et al.
Published: (2026)
by: Blank, Camila, et al.
Published: (2026)
Steering Evaluation-Aware Language Models to Act Like They Are Deployed
by: Hua, Tim Tian, et al.
Published: (2025)
by: Hua, Tim Tian, et al.
Published: (2025)
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
by: Jansen, Peter, et al.
Published: (2024)
by: Jansen, Peter, et al.
Published: (2024)
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
by: Ferrando, Javier, et al.
Published: (2024)
by: Ferrando, Javier, et al.
Published: (2024)
Similar Items
-
Neologism Learning for Controllability and Self-Verbalization
by: Hewitt, John, et al.
Published: (2025) -
We Can't Understand AI Using our Existing Vocabulary
by: Hewitt, John, et al.
Published: (2025) -
Digital Socrates: Evaluating LLMs through Explanation Critiques
by: Gu, Yuling, et al.
Published: (2023) -
BaRDa: A Belief and Reasoning Dataset that Separates Factual Accuracy and Reasoning Ability
by: Clark, Peter, et al.
Published: (2023) -
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs
by: Smit, Andries, et al.
Published: (2023)