:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bamberger, Zachary, Glick, Ofek, Baskin, Chaim, Belinkov, Yonatan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2405.07788
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods
by: Blau, Tsachi, et al.
Published: (2024)

ContraSim -- Analyzing Neural Representations Based on Contrastive Learning
by: Rahamim, Adir, et al.
Published: (2023)

Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024)

MATCH: Task-Driven Code Evaluation through Contrastive Learning
by: Ghoummaid, Marah, et al.
Published: (2025)

Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
by: Yu, Zeping, et al.
Published: (2025)

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
by: Iskander, Shadi, et al.
Published: (2024)

Inside-Out: Hidden Factual Knowledge in LLMs
by: Gekhman, Zorik, et al.
Published: (2025)

Concept-Best-Matching: Evaluating Compositionality in Emergent Communication
by: Carmeli, Boaz, et al.
Published: (2024)

Hysteresis Activation Function for Efficient Inference
by: Kimhi, Moshe, et al.
Published: (2024)

REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
by: Ashuach, Tomer, et al.
Published: (2024)

Are formal and functional linguistic mechanisms dissociated in language models?
by: Hanna, Michael, et al.
Published: (2025)

SAEs Are Good for Steering -- If You Select the Right Features
by: Arad, Dana, et al.
Published: (2025)

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
by: Itzhak, Itay, et al.
Published: (2025)

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
by: Hanna, Michael, et al.
Published: (2024)

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
by: Arad, Dana, et al.
Published: (2023)

LLM4SFC: Sequential Function Chart Generation via Large Language Models
by: Glick, Ofek, et al.
Published: (2025)

Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
by: Tutek, Martin, et al.
Published: (2025)

Distinguishing Ignorance from Error in LLM Hallucinations
by: Simhi, Adi, et al.
Published: (2024)

Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
by: Simhi, Adi, et al.
Published: (2024)

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)

From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
by: Itzhak, Itay, et al.
Published: (2026)

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
by: Kaplan, Guy, et al.
Published: (2025)

Silent Tokens, Loud Effects: Padding in LLMs
by: Himelstein, Rom, et al.
Published: (2025)

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
by: Nikankin, Yaniv, et al.
Published: (2024)

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
by: Nikankin, Yaniv, et al.
Published: (2025)

DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
by: Ventura, Mor, et al.
Published: (2025)

Differentiable Faithfulness Alignment for Cross-Model Circuit Transfer
by: Shao, Shun, et al.
Published: (2026)

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
by: Wiegreffe, Sarah, et al.
Published: (2024)

You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
by: LeVi, Amit, et al.
Published: (2025)

Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer
by: Simhi, Adi, et al.
Published: (2025)

Will it Merge? On The Causes of Model Mergeability
by: Rahamim, Adir, et al.
Published: (2026)

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
by: Prakash, Nikhil, et al.
Published: (2024)

Reasoning Models Know What's Important, and Encode It in Their Activations
by: Nikankin, Yaniv, et al.
Published: (2026)

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry
by: Toker, Michael, et al.
Published: (2024)

Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness
by: Ashuach, Tomer, et al.
Published: (2026)

Unsupervised Translation of Emergent Communication
by: Levy, Ido, et al.
Published: (2025)

CRISP: Persistent Concept Unlearning via Sparse Autoencoders
by: Ashuach, Tomer, et al.
Published: (2025)

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
by: Toker, Michael, et al.
Published: (2024)

Growing a Tail: Increasing Output Diversity in Large Language Models
by: Shur-Ofry, Michal, et al.
Published: (2024)

Position-aware Automatic Circuit Discovery
by: Haklay, Tal, et al.
Published: (2025)