Saved in:
| Main Authors: | Ghandeharioun, Asma, Yuan, Ann, Guerard, Marius, Reif, Emily, Lepori, Michael A., Dixon, Lucas |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.12094 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024)
by: Ghandeharioun, Asma, et al.
Published: (2024)
Think Before You Lie: How Reasoning Leads to Honesty
by: Yuan, Ann, et al.
Published: (2026)
by: Yuan, Ann, et al.
Published: (2026)
Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics
by: Blum, Carter, et al.
Published: (2025)
by: Blum, Carter, et al.
Published: (2025)
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
by: Lepori, Michael A., et al.
Published: (2024)
by: Lepori, Michael A., et al.
Published: (2024)
Language Models Struggle to Use Representations Learned In-Context
by: Lepori, Michael A., et al.
Published: (2026)
by: Lepori, Michael A., et al.
Published: (2026)
Interpretability Illusions in the Generalization of Simplified Models
by: Friedman, Dan, et al.
Published: (2023)
by: Friedman, Dan, et al.
Published: (2023)
When Can Transformers Count to n?
by: Yehudai, Gilad, et al.
Published: (2024)
by: Yehudai, Gilad, et al.
Published: (2024)
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
by: Kahng, Minsuk, et al.
Published: (2024)
by: Kahng, Minsuk, et al.
Published: (2024)
Understanding the Dataset Practitioners Behind Large Language Model Development
by: Qian, Crystal, et al.
Published: (2024)
by: Qian, Crystal, et al.
Published: (2024)
A Study on Hidden Layer Distillation for Large Language Model Pre-Training
by: Guigon, Maxime, et al.
Published: (2026)
by: Guigon, Maxime, et al.
Published: (2026)
Signatures of human-like processing in Transformer forward passes
by: Hu, Jennifer, et al.
Published: (2025)
by: Hu, Jennifer, et al.
Published: (2025)
Automatic Histograms: Leveraging Language Models for Text Dataset Exploration
by: Reif, Emily, et al.
Published: (2024)
by: Reif, Emily, et al.
Published: (2024)
Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions
by: Ji, Yongyi, et al.
Published: (2024)
by: Ji, Yongyi, et al.
Published: (2024)
Towards medical AI misalignment: a preliminary study
by: Puccio, Barbara, et al.
Published: (2025)
by: Puccio, Barbara, et al.
Published: (2025)
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
by: Li, Belinda Z., et al.
Published: (2025)
by: Li, Belinda Z., et al.
Published: (2025)
Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
by: Lepori, Michael A., et al.
Published: (2025)
by: Lepori, Michael A., et al.
Published: (2025)
From Tokens to Words: On the Inner Lexicon of LLMs
by: Kaplan, Guy, et al.
Published: (2024)
by: Kaplan, Guy, et al.
Published: (2024)
Who Laughs with Whom? Disentangling Influential Factors in Humor Preferences across User Clusters and LLMs
by: Murakami, Soichiro, et al.
Published: (2026)
by: Murakami, Soichiro, et al.
Published: (2026)
Emergent misalignment as prompt sensitivity: A research note
by: Wyse, Tim, et al.
Published: (2025)
by: Wyse, Tim, et al.
Published: (2025)
Large Language Models can Strategically Deceive their Users when Put Under Pressure
by: Scheurer, Jérémy, et al.
Published: (2023)
by: Scheurer, Jérémy, et al.
Published: (2023)
Probing the Limits of Stylistic Alignment in Vision-Language Models
by: Farajidizaji, Asma, et al.
Published: (2025)
by: Farajidizaji, Asma, et al.
Published: (2025)
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
by: Betley, Jan, et al.
Published: (2025)
by: Betley, Jan, et al.
Published: (2025)
Who Reasons in the Large Language Models?
by: Shao, Jie, et al.
Published: (2025)
by: Shao, Jie, et al.
Published: (2025)
"Who Am I, and Who Else Is Here?" Behavioral Differentiation Without Role Assignment in Multi-Agent LLM Systems
by: Kandoussi, Houssam EL
Published: (2026)
by: Kandoussi, Houssam EL
Published: (2026)
Neural Attention Search
by: Deng, Difan, et al.
Published: (2025)
by: Deng, Difan, et al.
Published: (2025)
Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
by: Reuben, Maor, et al.
Published: (2024)
by: Reuben, Maor, et al.
Published: (2024)
Who's Who: Large Language Models Meet Knowledge Conflicts in Practice
by: Pham, Quang Hieu, et al.
Published: (2024)
by: Pham, Quang Hieu, et al.
Published: (2024)
Interactive Prompt Debugging with Sequence Salience
by: Tenney, Ian, et al.
Published: (2024)
by: Tenney, Ian, et al.
Published: (2024)
VERT: Reliable LLM Judges for Radiology Report Evaluation
by: Bologna, Federica, et al.
Published: (2026)
by: Bologna, Federica, et al.
Published: (2026)
Exploring Large Language Models for Word Games:Who is the Spy?
by: Wei, Chentian, et al.
Published: (2025)
by: Wei, Chentian, et al.
Published: (2025)
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
by: Corbeil, Jean-Philippe, et al.
Published: (2025)
by: Corbeil, Jean-Philippe, et al.
Published: (2025)
Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
by: Kwon, Deuksin, et al.
Published: (2024)
by: Kwon, Deuksin, et al.
Published: (2024)
Who Speaks Matters: Analysing the Influence of the Speaker's Ethnicity on Hate Classification
by: Malik, Ananya, et al.
Published: (2024)
by: Malik, Ananya, et al.
Published: (2024)
Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation
by: Yao, Jiayu, et al.
Published: (2025)
by: Yao, Jiayu, et al.
Published: (2025)
Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic
by: Ingimundarson, Finnur Ágúst, et al.
Published: (2026)
by: Ingimundarson, Finnur Ágúst, et al.
Published: (2026)
From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment
by: Li, Jia-Nan, et al.
Published: (2025)
by: Li, Jia-Nan, et al.
Published: (2025)
OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
by: Huang, Zhen, et al.
Published: (2024)
by: Huang, Zhen, et al.
Published: (2024)
A Graph Talks, But Who's Listening? Rethinking Evaluations for Graph-Language Models
by: Petkar, Soham, et al.
Published: (2025)
by: Petkar, Soham, et al.
Published: (2025)
ConstitutionalExperts: Training a Mixture of Principle-based Prompts
by: Petridis, Savvas, et al.
Published: (2024)
by: Petridis, Savvas, et al.
Published: (2024)
UserHarness: Harnessing User Minds for Stronger Agent Theory-of-Mind
by: Qian, Cheng, et al.
Published: (2026)
by: Qian, Cheng, et al.
Published: (2026)
Similar Items
-
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024) -
Think Before You Lie: How Reasoning Leads to Honesty
by: Yuan, Ann, et al.
Published: (2026) -
Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics
by: Blum, Carter, et al.
Published: (2025) -
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
by: Lepori, Michael A., et al.
Published: (2024) -
Language Models Struggle to Use Representations Learned In-Context
by: Lepori, Michael A., et al.
Published: (2026)