Saved in:
| Main Authors: | Rahamim, Adir, Belinkov, Yonatan |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2303.16992 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024)
by: Rahamim, Adir, et al.
Published: (2024)
Growing a Tail: Increasing Output Diversity in Large Language Models
by: Shur-Ofry, Michal, et al.
Published: (2024)
by: Shur-Ofry, Michal, et al.
Published: (2024)
Will it Merge? On The Causes of Model Mergeability
by: Rahamim, Adir, et al.
Published: (2026)
by: Rahamim, Adir, et al.
Published: (2026)
Contrastive Similarity Learning for Market Forecasting: The ContraSim Framework
by: Vinden, Nicholas, et al.
Published: (2025)
by: Vinden, Nicholas, et al.
Published: (2025)
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
by: Iskander, Shadi, et al.
Published: (2024)
by: Iskander, Shadi, et al.
Published: (2024)
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
by: Yu, Zeping, et al.
Published: (2025)
by: Yu, Zeping, et al.
Published: (2025)
Concept-Best-Matching: Evaluating Compositionality in Emergent Communication
by: Carmeli, Boaz, et al.
Published: (2024)
by: Carmeli, Boaz, et al.
Published: (2024)
Are formal and functional linguistic mechanisms dissociated in language models?
by: Hanna, Michael, et al.
Published: (2025)
by: Hanna, Michael, et al.
Published: (2025)
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
by: Ashuach, Tomer, et al.
Published: (2024)
by: Ashuach, Tomer, et al.
Published: (2024)
SAEs Are Good for Steering -- If You Select the Right Features
by: Arad, Dana, et al.
Published: (2025)
by: Arad, Dana, et al.
Published: (2025)
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
by: Itzhak, Itay, et al.
Published: (2025)
by: Itzhak, Itay, et al.
Published: (2025)
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
by: Hanna, Michael, et al.
Published: (2024)
by: Hanna, Michael, et al.
Published: (2024)
Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods
by: Blau, Tsachi, et al.
Published: (2024)
by: Blau, Tsachi, et al.
Published: (2024)
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
by: Arad, Dana, et al.
Published: (2023)
by: Arad, Dana, et al.
Published: (2023)
DEPTH: Discourse Education through Pre-Training Hierarchically
by: Bamberger, Zachary, et al.
Published: (2024)
by: Bamberger, Zachary, et al.
Published: (2024)
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
by: Tutek, Martin, et al.
Published: (2025)
by: Tutek, Martin, et al.
Published: (2025)
Distinguishing Ignorance from Error in LLM Hallucinations
by: Simhi, Adi, et al.
Published: (2024)
by: Simhi, Adi, et al.
Published: (2024)
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
by: Simhi, Adi, et al.
Published: (2024)
by: Simhi, Adi, et al.
Published: (2024)
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)
by: Katz, Shahar, et al.
Published: (2024)
From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
by: Itzhak, Itay, et al.
Published: (2026)
by: Itzhak, Itay, et al.
Published: (2026)
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
by: Kaplan, Guy, et al.
Published: (2025)
by: Kaplan, Guy, et al.
Published: (2025)
Silent Tokens, Loud Effects: Padding in LLMs
by: Himelstein, Rom, et al.
Published: (2025)
by: Himelstein, Rom, et al.
Published: (2025)
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
by: Nikankin, Yaniv, et al.
Published: (2024)
by: Nikankin, Yaniv, et al.
Published: (2024)
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
by: Nikankin, Yaniv, et al.
Published: (2025)
by: Nikankin, Yaniv, et al.
Published: (2025)
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
by: Ventura, Mor, et al.
Published: (2025)
by: Ventura, Mor, et al.
Published: (2025)
Differentiable Faithfulness Alignment for Cross-Model Circuit Transfer
by: Shao, Shun, et al.
Published: (2026)
by: Shao, Shun, et al.
Published: (2026)
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
by: Wiegreffe, Sarah, et al.
Published: (2024)
by: Wiegreffe, Sarah, et al.
Published: (2024)
Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer
by: Simhi, Adi, et al.
Published: (2025)
by: Simhi, Adi, et al.
Published: (2025)
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
by: Orgad, Hadas, et al.
Published: (2024)
by: Orgad, Hadas, et al.
Published: (2024)
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
by: Prakash, Nikhil, et al.
Published: (2024)
by: Prakash, Nikhil, et al.
Published: (2024)
Reasoning Models Know What's Important, and Encode It in Their Activations
by: Nikankin, Yaniv, et al.
Published: (2026)
by: Nikankin, Yaniv, et al.
Published: (2026)
A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry
by: Toker, Michael, et al.
Published: (2024)
by: Toker, Michael, et al.
Published: (2024)
Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness
by: Ashuach, Tomer, et al.
Published: (2026)
by: Ashuach, Tomer, et al.
Published: (2026)
Unsupervised Translation of Emergent Communication
by: Levy, Ido, et al.
Published: (2025)
by: Levy, Ido, et al.
Published: (2025)
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
by: Ashuach, Tomer, et al.
Published: (2025)
by: Ashuach, Tomer, et al.
Published: (2025)
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
by: Toker, Michael, et al.
Published: (2024)
by: Toker, Michael, et al.
Published: (2024)
Position-aware Automatic Circuit Discovery
by: Haklay, Tal, et al.
Published: (2025)
by: Haklay, Tal, et al.
Published: (2025)
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
by: Simhi, Adi, et al.
Published: (2025)
by: Simhi, Adi, et al.
Published: (2025)
Old Habits Die Hard: How Conversational History Geometrically Traps LLMs
by: Simhi, Adi, et al.
Published: (2026)
by: Simhi, Adi, et al.
Published: (2026)
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
by: Toker, Michael, et al.
Published: (2025)
by: Toker, Michael, et al.
Published: (2025)
Similar Items
-
Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024) -
Growing a Tail: Increasing Output Diversity in Large Language Models
by: Shur-Ofry, Michal, et al.
Published: (2024) -
Will it Merge? On The Causes of Model Mergeability
by: Rahamim, Adir, et al.
Published: (2026) -
Contrastive Similarity Learning for Market Forecasting: The ContraSim Framework
by: Vinden, Nicholas, et al.
Published: (2025) -
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
by: Iskander, Shadi, et al.
Published: (2024)