Saved in:
| Main Authors: | Todd, Eric, Li, Millicent L., Sharma, Arnab Sen, Mueller, Aaron, Wallace, Byron C., Bau, David |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.15213 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
by: Feucht, Sheridan, et al.
Published: (2024)
by: Feucht, Sheridan, et al.
Published: (2024)
Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
by: Pal, Koyena, et al.
Published: (2023)
by: Pal, Koyena, et al.
Published: (2023)
Do Activation Verbalization Methods Convey Privileged Information?
by: Li, Millicent, et al.
Published: (2025)
by: Li, Millicent, et al.
Published: (2025)
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
by: Marks, Samuel, et al.
Published: (2024)
by: Marks, Samuel, et al.
Published: (2024)
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
by: Ahsan, Hiba, et al.
Published: (2025)
by: Ahsan, Hiba, et al.
Published: (2025)
Vector Arithmetic in Concept and Token Subspaces
by: Feucht, Sheridan, et al.
Published: (2025)
by: Feucht, Sheridan, et al.
Published: (2025)
The Dual-Route Model of Induction
by: Feucht, Sheridan, et al.
Published: (2025)
by: Feucht, Sheridan, et al.
Published: (2025)
In-Context Algebra
by: Todd, Eric, et al.
Published: (2025)
by: Todd, Eric, et al.
Published: (2025)
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
by: Ahsan, Hiba, et al.
Published: (2025)
by: Ahsan, Hiba, et al.
Published: (2025)
Position-aware Automatic Circuit Discovery
by: Haklay, Tal, et al.
Published: (2025)
by: Haklay, Tal, et al.
Published: (2025)
Locating and Editing Factual Associations in Mamba
by: Sharma, Arnab Sen, et al.
Published: (2024)
by: Sharma, Arnab Sen, et al.
Published: (2024)
The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis
by: Mueller, Aaron, et al.
Published: (2024)
by: Mueller, Aaron, et al.
Published: (2024)
Erasing Conceptual Knowledge from Language Models
by: Gandikota, Rohit, et al.
Published: (2024)
by: Gandikota, Rohit, et al.
Published: (2024)
GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence
by: Krishna, Kundan, et al.
Published: (2024)
by: Krishna, Kundan, et al.
Published: (2024)
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
by: Mueller, Aaron
Published: (2024)
by: Mueller, Aaron
Published: (2024)
Discovering Forbidden Topics in Language Models
by: Rager, Can, et al.
Published: (2025)
by: Rager, Can, et al.
Published: (2025)
Compared to What? Baselines and Metrics for Counterfactual Prompting
by: Yang, Zihao, et al.
Published: (2026)
by: Yang, Zihao, et al.
Published: (2026)
Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank
by: Roy, Debjyoti Saha, et al.
Published: (2024)
by: Roy, Debjyoti Saha, et al.
Published: (2024)
PARAMANU-GANITA: Can Small Math Language Models Rival with Large Language Models on Mathematical Reasoning?
by: Niyogi, Mitodru, et al.
Published: (2024)
by: Niyogi, Mitodru, et al.
Published: (2024)
What Evidence Do Language Models Find Convincing?
by: Wan, Alexander, et al.
Published: (2024)
by: Wan, Alexander, et al.
Published: (2024)
Adaptive Task Vectors for Large Language Models
by: Kang, Joonseong, et al.
Published: (2025)
by: Kang, Joonseong, et al.
Published: (2025)
Ayn: A Tiny yet Competitive Indian Legal Language Model Pretrained from Scratch
by: Niyogi, Mitodru, et al.
Published: (2024)
by: Niyogi, Mitodru, et al.
Published: (2024)
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models
by: Sen, Tanmay, et al.
Published: (2024)
by: Sen, Tanmay, et al.
Published: (2024)
Shared Lexical Task Representations Explain Behavioral Variability In LLMs
by: Yang, Zhuonan, et al.
Published: (2026)
by: Yang, Zhuonan, et al.
Published: (2026)
Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
by: Ramprasad, Sanjana, et al.
Published: (2024)
by: Ramprasad, Sanjana, et al.
Published: (2024)
Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
by: Li, Kenneth, et al.
Published: (2024)
by: Li, Kenneth, et al.
Published: (2024)
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence
by: Hager, Sophia, et al.
Published: (2025)
by: Hager, Sophia, et al.
Published: (2025)
Exploring Context Window of Large Language Models via Decomposed Positional Vectors
by: Dong, Zican, et al.
Published: (2024)
by: Dong, Zican, et al.
Published: (2024)
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
by: Xu, Zukang, et al.
Published: (2025)
by: Xu, Zukang, et al.
Published: (2025)
Entity Matching using Large Language Models
by: Peeters, Ralph, et al.
Published: (2023)
by: Peeters, Ralph, et al.
Published: (2023)
Discovering Decoupled Functional Modules in Large Language Models
by: Yu, Yanke, et al.
Published: (2026)
by: Yu, Yanke, et al.
Published: (2026)
Tx-LLM: A Large Language Model for Therapeutics
by: Chaves, Juan Manuel Zambrano, et al.
Published: (2024)
by: Chaves, Juan Manuel Zambrano, et al.
Published: (2024)
Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction
by: Opiełka, Gustaw, et al.
Published: (2025)
by: Opiełka, Gustaw, et al.
Published: (2025)
Spectral Generative Flow Models: A Physics-Inspired Replacement for Vectorized Large Language Models
by: Kiruluta, Andrew
Published: (2026)
by: Kiruluta, Andrew
Published: (2026)
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models
by: Cao, Yang, et al.
Published: (2024)
by: Cao, Yang, et al.
Published: (2024)
A Critical Evaluation of AI Feedback for Aligning Large Language Models
by: Sharma, Archit, et al.
Published: (2024)
by: Sharma, Archit, et al.
Published: (2024)
Confidence-Modulated Speculative Decoding for Large Language Models
by: Sen, Jaydip, et al.
Published: (2025)
by: Sen, Jaydip, et al.
Published: (2025)
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
by: Kang, Katie, et al.
Published: (2024)
by: Kang, Katie, et al.
Published: (2024)
Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models
by: Wallace, Tom, et al.
Published: (2025)
by: Wallace, Tom, et al.
Published: (2025)
Confidence Elicitation: A New Attack Vector for Large Language Models
by: Formento, Brian, et al.
Published: (2025)
by: Formento, Brian, et al.
Published: (2025)
Similar Items
-
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
by: Feucht, Sheridan, et al.
Published: (2024) -
Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
by: Pal, Koyena, et al.
Published: (2023) -
Do Activation Verbalization Methods Convey Privileged Information?
by: Li, Millicent, et al.
Published: (2025) -
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
by: Marks, Samuel, et al.
Published: (2024) -
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
by: Ahsan, Hiba, et al.
Published: (2025)