:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Millicent, Arroyo, Alberto Mario Ceballos, Rogers, Giordano, Saphra, Naomi, Wallace, Byron C.
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2509.13316
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mechanistic?
by: Saphra, Naomi, et al.
Published: (2024)

Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization
by: Qin, Tian, et al.
Published: (2024)

Function Vectors in Large Language Models
by: Todd, Eric, et al.
Published: (2023)

TRAM: Bridging Trust Regions and Sharpness Aware Minimization
by: Sherborne, Tom, et al.
Published: (2023)

Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024)

Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
by: Ahsan, Hiba, et al.
Published: (2025)

Can Interpretation Predict Behavior on Unseen Data?
by: Li, Victoria R., et al.
Published: (2025)

Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation
by: Zhao, Haiyan, et al.
Published: (2026)

Fine-Tuning Improves Information Conveyance in Language Models
by: Cheng, Yuwei, et al.
Published: (2026)

Compared to What? Baselines and Metrics for Counterfactual Prompting
by: Yang, Zihao, et al.
Published: (2026)

Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank
by: Roy, Debjyoti Saha, et al.
Published: (2024)

Attribute Diversity Determines the Systematicity Gap in VQA
by: Berlot-Attwell, Ian, et al.
Published: (2023)

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
by: Wallace, Eric, et al.
Published: (2024)

PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
by: van der Wal, Oskar, et al.
Published: (2025)

Using Shapley interactions to understand how models use structure
by: Singhvi, Divyansh, et al.
Published: (2024)

Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
by: Pal, Koyena, et al.
Published: (2023)

Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
by: Ramprasad, Sanjana, et al.
Published: (2024)

GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence
by: Krishna, Kundan, et al.
Published: (2024)

Verbal Werewolf: Engage Users with Verbalized Agentic Werewolf Game Framework
by: Fan, Qihui, et al.
Published: (2025)

Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
by: Feucht, Sheridan, et al.
Published: (2024)

Multimodal Tabular Reasoning with Privileged Structured Information
by: Jiang, Jun-Peng, et al.
Published: (2025)

Learning Using Generated Privileged Information by Text-to-Image Diffusion Models
by: Menadil, Rafael-Edy, et al.
Published: (2023)

What Evidence Do Language Models Find Convincing?
by: Wan, Alexander, et al.
Published: (2024)

Are LLM Decisions Faithful to Verbal Confidence?
by: Wang, Jiawei, et al.
Published: (2026)

Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
by: Sutawika, Lintang, et al.
Published: (2026)

GATES: Self-Distillation under Privileged Context with Consensus Gating
by: Stein, Alex, et al.
Published: (2026)

ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models
by: Li, Chen, et al.
Published: (2026)

Reinforcing Human Behavior Simulation via Verbal Feedback
by: Sun, Weiwei, et al.
Published: (2026)

Open (Clinical) LLMs are Sensitive to Instruction Phrasings
by: Arroyo, Alberto Mario Ceballos, et al.
Published: (2024)

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
by: Ma, Chi, et al.
Published: (2024)

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data
by: Zhang, Yaocheng, et al.
Published: (2026)

How do LLMs Compute Verbal Confidence
by: Kumaran, Dharshan, et al.
Published: (2026)

Inference and Verbalization Functions During In-Context Learning
by: Tao, Junyi, et al.
Published: (2024)

Hidden Breakthroughs in Language Model Training
by: Kangaslahti, Sara, et al.
Published: (2025)

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration
by: Qu, Yuxiao, et al.
Published: (2026)

From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives
by: Fan, Shuxian, et al.
Published: (2024)

ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
by: Li, Victoria R., et al.
Published: (2024)

Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending
by: Sanz-Guerrero, Mario, et al.
Published: (2024)

Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
by: Ramprasad, Sanjana, et al.
Published: (2024)

Verbal Process Supervision Elicits Better Coding Agents
by: Chen, Hao-Yuan, et al.
Published: (2025)