:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ghandeharioun, Asma, Yuan, Ann, Guerard, Marius, Reif, Emily, Lepori, Michael A., Dixon, Lucas
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.12094
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024)

Think Before You Lie: How Reasoning Leads to Honesty
by: Yuan, Ann, et al.
Published: (2026)

Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics
by: Blum, Carter, et al.
Published: (2025)

Racing Thoughts: Explaining Contextualization Errors in Large Language Models
by: Lepori, Michael A., et al.
Published: (2024)

Language Models Struggle to Use Representations Learned In-Context
by: Lepori, Michael A., et al.
Published: (2026)

Interpretability Illusions in the Generalization of Simplified Models
by: Friedman, Dan, et al.
Published: (2023)

When Can Transformers Count to n?
by: Yehudai, Gilad, et al.
Published: (2024)

LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
by: Kahng, Minsuk, et al.
Published: (2024)

Understanding the Dataset Practitioners Behind Large Language Model Development
by: Qian, Crystal, et al.
Published: (2024)

A Study on Hidden Layer Distillation for Large Language Model Pre-Training
by: Guigon, Maxime, et al.
Published: (2026)

Signatures of human-like processing in Transformer forward passes
by: Hu, Jennifer, et al.
Published: (2025)

Automatic Histograms: Leveraging Language Models for Text Dataset Exploration
by: Reif, Emily, et al.
Published: (2024)

Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions
by: Ji, Yongyi, et al.
Published: (2024)

Towards medical AI misalignment: a preliminary study
by: Puccio, Barbara, et al.
Published: (2025)

QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
by: Li, Belinda Z., et al.
Published: (2025)

Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
by: Lepori, Michael A., et al.
Published: (2025)

From Tokens to Words: On the Inner Lexicon of LLMs
by: Kaplan, Guy, et al.
Published: (2024)

Who Laughs with Whom? Disentangling Influential Factors in Humor Preferences across User Clusters and LLMs
by: Murakami, Soichiro, et al.
Published: (2026)

Emergent misalignment as prompt sensitivity: A research note
by: Wyse, Tim, et al.
Published: (2025)

Large Language Models can Strategically Deceive their Users when Put Under Pressure
by: Scheurer, Jérémy, et al.
Published: (2023)

Probing the Limits of Stylistic Alignment in Vision-Language Models
by: Farajidizaji, Asma, et al.
Published: (2025)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
by: Betley, Jan, et al.
Published: (2025)

Who Reasons in the Large Language Models?
by: Shao, Jie, et al.
Published: (2025)

"Who Am I, and Who Else Is Here?" Behavioral Differentiation Without Role Assignment in Multi-Agent LLM Systems
by: Kandoussi, Houssam EL
Published: (2026)

Neural Attention Search
by: Deng, Difan, et al.
Published: (2025)

Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
by: Reuben, Maor, et al.
Published: (2024)

Who's Who: Large Language Models Meet Knowledge Conflicts in Practice
by: Pham, Quang Hieu, et al.
Published: (2024)

Interactive Prompt Debugging with Sequence Salience
by: Tenney, Ian, et al.
Published: (2024)

VERT: Reliable LLM Judges for Radiology Report Evaluation
by: Bologna, Federica, et al.
Published: (2026)

Exploring Large Language Models for Word Games:Who is the Spy?
by: Wei, Chentian, et al.
Published: (2025)

A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
by: Corbeil, Jean-Philippe, et al.
Published: (2025)

Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
by: Kwon, Deuksin, et al.
Published: (2024)

Who Speaks Matters: Analysing the Influence of the Speaker's Ethnicity on Hate Classification
by: Malik, Ananya, et al.
Published: (2024)

Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation
by: Yao, Jiayu, et al.
Published: (2025)

Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic
by: Ingimundarson, Finnur Ágúst, et al.
Published: (2026)

From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment
by: Li, Jia-Nan, et al.
Published: (2025)

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
by: Huang, Zhen, et al.
Published: (2024)

A Graph Talks, But Who's Listening? Rethinking Evaluations for Graph-Language Models
by: Petkar, Soham, et al.
Published: (2025)

ConstitutionalExperts: Training a Mixture of Principle-based Prompts
by: Petridis, Savvas, et al.
Published: (2024)

UserHarness: Harnessing User Minds for Stronger Agent Theory-of-Mind
by: Qian, Cheng, et al.
Published: (2026)