:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rahamim, Adir, Belinkov, Yonatan
Format:	Preprint
Published:	2023
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2303.16992
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024)

Growing a Tail: Increasing Output Diversity in Large Language Models
by: Shur-Ofry, Michal, et al.
Published: (2024)

Will it Merge? On The Causes of Model Mergeability
by: Rahamim, Adir, et al.
Published: (2026)

Contrastive Similarity Learning for Market Forecasting: The ContraSim Framework
by: Vinden, Nicholas, et al.
Published: (2025)

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
by: Iskander, Shadi, et al.
Published: (2024)

Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
by: Yu, Zeping, et al.
Published: (2025)

Concept-Best-Matching: Evaluating Compositionality in Emergent Communication
by: Carmeli, Boaz, et al.
Published: (2024)

Are formal and functional linguistic mechanisms dissociated in language models?
by: Hanna, Michael, et al.
Published: (2025)

REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
by: Ashuach, Tomer, et al.
Published: (2024)

SAEs Are Good for Steering -- If You Select the Right Features
by: Arad, Dana, et al.
Published: (2025)

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
by: Itzhak, Itay, et al.
Published: (2025)

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
by: Hanna, Michael, et al.
Published: (2024)

Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods
by: Blau, Tsachi, et al.
Published: (2024)

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
by: Arad, Dana, et al.
Published: (2023)

DEPTH: Discourse Education through Pre-Training Hierarchically
by: Bamberger, Zachary, et al.
Published: (2024)

Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
by: Tutek, Martin, et al.
Published: (2025)

Distinguishing Ignorance from Error in LLM Hallucinations
by: Simhi, Adi, et al.
Published: (2024)

Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
by: Simhi, Adi, et al.
Published: (2024)

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)

From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
by: Itzhak, Itay, et al.
Published: (2026)

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
by: Kaplan, Guy, et al.
Published: (2025)

Silent Tokens, Loud Effects: Padding in LLMs
by: Himelstein, Rom, et al.
Published: (2025)

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
by: Nikankin, Yaniv, et al.
Published: (2024)

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
by: Nikankin, Yaniv, et al.
Published: (2025)

DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
by: Ventura, Mor, et al.
Published: (2025)

Differentiable Faithfulness Alignment for Cross-Model Circuit Transfer
by: Shao, Shun, et al.
Published: (2026)

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
by: Wiegreffe, Sarah, et al.
Published: (2024)

Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer
by: Simhi, Adi, et al.
Published: (2025)

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
by: Orgad, Hadas, et al.
Published: (2024)

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
by: Prakash, Nikhil, et al.
Published: (2024)

Reasoning Models Know What's Important, and Encode It in Their Activations
by: Nikankin, Yaniv, et al.
Published: (2026)

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry
by: Toker, Michael, et al.
Published: (2024)

Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness
by: Ashuach, Tomer, et al.
Published: (2026)

Unsupervised Translation of Emergent Communication
by: Levy, Ido, et al.
Published: (2025)

CRISP: Persistent Concept Unlearning via Sparse Autoencoders
by: Ashuach, Tomer, et al.
Published: (2025)

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
by: Toker, Michael, et al.
Published: (2024)

Position-aware Automatic Circuit Discovery
by: Haklay, Tal, et al.
Published: (2025)

ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
by: Simhi, Adi, et al.
Published: (2025)

Old Habits Die Hard: How Conversational History Geometrically Traps LLMs
by: Simhi, Adi, et al.
Published: (2026)

Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
by: Toker, Michael, et al.
Published: (2025)