Saved in:
| Main Authors: | Bai, Xiaoyan, Pres, Itamar, Deng, Yuntian, Tan, Chenhao, Shieber, Stuart, Viégas, Fernanda, Wattenberg, Martin, Lee, Andrew |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.00184 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
by: Deng, Yuntian, et al.
Published: (2024)
by: Deng, Yuntian, et al.
Published: (2024)
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
by: Lee, Andrew, et al.
Published: (2024)
by: Lee, Andrew, et al.
Published: (2024)
Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions
by: Lee, Andrew, et al.
Published: (2026)
by: Lee, Andrew, et al.
Published: (2026)
Relational Composition in Neural Networks: A Survey and Call to Action
by: Wattenberg, Martin, et al.
Published: (2024)
by: Wattenberg, Martin, et al.
Published: (2024)
What Does it Mean for a Neural Network to Learn a "World Model"?
by: Li, Kenneth, et al.
Published: (2025)
by: Li, Kenneth, et al.
Published: (2025)
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
by: Li, Kenneth, et al.
Published: (2024)
by: Li, Kenneth, et al.
Published: (2024)
Decomposing Query-Key Feature Interactions Using Contrastive Covariances
by: Lee, Andrew, et al.
Published: (2026)
by: Lee, Andrew, et al.
Published: (2026)
When Bad Data Leads to Good Models
by: Li, Kenneth, et al.
Published: (2025)
by: Li, Kenneth, et al.
Published: (2025)
Shared Global and Local Geometry of Language Model Embeddings
by: Lee, Andrew, et al.
Published: (2025)
by: Lee, Andrew, et al.
Published: (2025)
The Geometry of Self-Verification in a Task-Specific Reasoning Model
by: Lee, Andrew, et al.
Published: (2025)
by: Lee, Andrew, et al.
Published: (2025)
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
by: Li, Kenneth, et al.
Published: (2023)
by: Li, Kenneth, et al.
Published: (2023)
Chronotome: Real-Time Topic Modeling for Streaming Embedding Spaces
by: Lim, Matte, et al.
Published: (2025)
by: Lim, Matte, et al.
Published: (2025)
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
by: Pres, Itamar, et al.
Published: (2024)
by: Pres, Itamar, et al.
Published: (2024)
AbsenceBench: Language Models Can't Tell What's Missing
by: Fu, Harvey Yiyun, et al.
Published: (2025)
by: Fu, Harvey Yiyun, et al.
Published: (2025)
Competition Dynamics Shape Algorithmic Phases of In-Context Learning
by: Park, Core Francisco, et al.
Published: (2024)
by: Park, Core Francisco, et al.
Published: (2024)
Why AI Can't Simulate Extreme Decision-Making
by: Rosehill, Daniel, et al.
Published: (2026)
by: Rosehill, Daniel, et al.
Published: (2026)
Why Can't I Ever Find Anything in the Library?
by: Radford, Neil, et al.
Published: (1983)
by: Radford, Neil, et al.
Published: (1983)
Why I Can't Create a Learning Center
by: Miller, Rosalind
Published: (1975)
by: Miller, Rosalind
Published: (1975)
A National Digital Library for Science, Mathematics, Engineering, and Technology Education.
by: Wattenberg, Frank
Published: (1998)
by: Wattenberg, Frank
Published: (1998)
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
by: Li, Kenneth, et al.
Published: (2022)
by: Li, Kenneth, et al.
Published: (2022)
"We are currently clean on OPSEC": Why JD Can't Encrypt
by: Chiodo, Maurice, et al.
Published: (2026)
by: Chiodo, Maurice, et al.
Published: (2026)
Why the Center Can't Hold: A Diagnosis of Puritanized America
by: O’Neill, Tom
Published: (2019)
by: O’Neill, Tom
Published: (2019)
Why I Can't Read Wallace Stegner, and Other Essays
by: Cook-Lynn, Elizabeth
Published: (2025)
by: Cook-Lynn, Elizabeth
Published: (2025)
Son of Why Johnny Can't Read and What You Do About It, by Hugo Flesch, Son of Rudolf Flesch, Author of Son of Why Johnny Can't Read and...
by: Flesch, Hugo
Published: (1970)
by: Flesch, Hugo
Published: (1970)
Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
by: Li, Kenneth, et al.
Published: (2024)
by: Li, Kenneth, et al.
Published: (2024)
Concept Incongruence: An Exploration of Time and Death in Role Playing
by: Bai, Xiaoyan, et al.
Published: (2025)
by: Bai, Xiaoyan, et al.
Published: (2025)
Know Thyself? On the Incapability and Implications of AI Self-Recognition
by: Bai, Xiaoyan, et al.
Published: (2025)
by: Bai, Xiaoyan, et al.
Published: (2025)
Time Blindness: Why Video-Language Models Can't See What Humans Can?
by: Upadhyay, Ujjwal, et al.
Published: (2025)
by: Upadhyay, Ujjwal, et al.
Published: (2025)
Can’t Touch This
Published: (2024)
Published: (2024)
Story Ribbons: Reimagining Storyline Visualizations with Large Language Models
by: Yeh, Catherine, et al.
Published: (2025)
by: Yeh, Catherine, et al.
Published: (2025)
Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality
by: Bogucka, Edyta, et al.
Published: (2026)
by: Bogucka, Edyta, et al.
Published: (2026)
They Can't Hear Us Does Not Mean We Can't Serve Them.
by: McDaniel, Julie Ann
Published: (1992)
by: McDaniel, Julie Ann
Published: (1992)
Ep. 178: The Skywave Secret: Why Aviation Can't Quit HF Radio
by: Rosehill, Daniel, et al.
Published: (2026)
by: Rosehill, Daniel, et al.
Published: (2026)
Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!
by: Jiang, Yanna, et al.
Published: (2026)
by: Jiang, Yanna, et al.
Published: (2026)
Why We Can't Afford to Turn Our Backs on Equity, Diversity and Inclusion
by: Phyllis Richards, et al.
Published: (2024)
by: Phyllis Richards, et al.
Published: (2024)
On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication
by: Wei, Zichao
Published: (2026)
by: Wei, Zichao
Published: (2026)
Analysis of the Stellar Occultations During the Unprecedented Long-Duration Flare
by: Bicz, Kamil, et al.
Published: (2024)
by: Bicz, Kamil, et al.
Published: (2024)
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?
by: Balepur, Nishant, et al.
Published: (2024)
by: Balepur, Nishant, et al.
Published: (2024)
The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
by: Bai, Xiaoyan, et al.
Published: (2026)
by: Bai, Xiaoyan, et al.
Published: (2026)
Ep. 1086: Why AI Can't Stop Talking About Second Order Effects
by: Rosehill, Daniel, et al.
Published: (2026)
by: Rosehill, Daniel, et al.
Published: (2026)
Similar Items
-
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
by: Deng, Yuntian, et al.
Published: (2024) -
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
by: Lee, Andrew, et al.
Published: (2024) -
Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions
by: Lee, Andrew, et al.
Published: (2026) -
Relational Composition in Neural Networks: A Survey and Call to Action
by: Wattenberg, Martin, et al.
Published: (2024) -
What Does it Mean for a Neural Network to Learn a "World Model"?
by: Li, Kenneth, et al.
Published: (2025)