Saved in:
| Main Authors: | Bamberger, Zachary, Glick, Ofek, Baskin, Chaim, Belinkov, Yonatan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.07788 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods
by: Blau, Tsachi, et al.
Published: (2024)
by: Blau, Tsachi, et al.
Published: (2024)
ContraSim -- Analyzing Neural Representations Based on Contrastive Learning
by: Rahamim, Adir, et al.
Published: (2023)
by: Rahamim, Adir, et al.
Published: (2023)
Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024)
by: Rahamim, Adir, et al.
Published: (2024)
MATCH: Task-Driven Code Evaluation through Contrastive Learning
by: Ghoummaid, Marah, et al.
Published: (2025)
by: Ghoummaid, Marah, et al.
Published: (2025)
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
by: Yu, Zeping, et al.
Published: (2025)
by: Yu, Zeping, et al.
Published: (2025)
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
by: Iskander, Shadi, et al.
Published: (2024)
by: Iskander, Shadi, et al.
Published: (2024)
Inside-Out: Hidden Factual Knowledge in LLMs
by: Gekhman, Zorik, et al.
Published: (2025)
by: Gekhman, Zorik, et al.
Published: (2025)
Concept-Best-Matching: Evaluating Compositionality in Emergent Communication
by: Carmeli, Boaz, et al.
Published: (2024)
by: Carmeli, Boaz, et al.
Published: (2024)
Hysteresis Activation Function for Efficient Inference
by: Kimhi, Moshe, et al.
Published: (2024)
by: Kimhi, Moshe, et al.
Published: (2024)
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
by: Ashuach, Tomer, et al.
Published: (2024)
by: Ashuach, Tomer, et al.
Published: (2024)
Are formal and functional linguistic mechanisms dissociated in language models?
by: Hanna, Michael, et al.
Published: (2025)
by: Hanna, Michael, et al.
Published: (2025)
SAEs Are Good for Steering -- If You Select the Right Features
by: Arad, Dana, et al.
Published: (2025)
by: Arad, Dana, et al.
Published: (2025)
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
by: Itzhak, Itay, et al.
Published: (2025)
by: Itzhak, Itay, et al.
Published: (2025)
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
by: Hanna, Michael, et al.
Published: (2024)
by: Hanna, Michael, et al.
Published: (2024)
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
by: Arad, Dana, et al.
Published: (2023)
by: Arad, Dana, et al.
Published: (2023)
LLM4SFC: Sequential Function Chart Generation via Large Language Models
by: Glick, Ofek, et al.
Published: (2025)
by: Glick, Ofek, et al.
Published: (2025)
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
by: Tutek, Martin, et al.
Published: (2025)
by: Tutek, Martin, et al.
Published: (2025)
Distinguishing Ignorance from Error in LLM Hallucinations
by: Simhi, Adi, et al.
Published: (2024)
by: Simhi, Adi, et al.
Published: (2024)
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
by: Simhi, Adi, et al.
Published: (2024)
by: Simhi, Adi, et al.
Published: (2024)
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)
by: Katz, Shahar, et al.
Published: (2024)
From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
by: Itzhak, Itay, et al.
Published: (2026)
by: Itzhak, Itay, et al.
Published: (2026)
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
by: Kaplan, Guy, et al.
Published: (2025)
by: Kaplan, Guy, et al.
Published: (2025)
Silent Tokens, Loud Effects: Padding in LLMs
by: Himelstein, Rom, et al.
Published: (2025)
by: Himelstein, Rom, et al.
Published: (2025)
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
by: Nikankin, Yaniv, et al.
Published: (2024)
by: Nikankin, Yaniv, et al.
Published: (2024)
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
by: Nikankin, Yaniv, et al.
Published: (2025)
by: Nikankin, Yaniv, et al.
Published: (2025)
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
by: Ventura, Mor, et al.
Published: (2025)
by: Ventura, Mor, et al.
Published: (2025)
Differentiable Faithfulness Alignment for Cross-Model Circuit Transfer
by: Shao, Shun, et al.
Published: (2026)
by: Shao, Shun, et al.
Published: (2026)
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
by: Wiegreffe, Sarah, et al.
Published: (2024)
by: Wiegreffe, Sarah, et al.
Published: (2024)
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
by: LeVi, Amit, et al.
Published: (2025)
by: LeVi, Amit, et al.
Published: (2025)
Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer
by: Simhi, Adi, et al.
Published: (2025)
by: Simhi, Adi, et al.
Published: (2025)
Will it Merge? On The Causes of Model Mergeability
by: Rahamim, Adir, et al.
Published: (2026)
by: Rahamim, Adir, et al.
Published: (2026)
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
by: Prakash, Nikhil, et al.
Published: (2024)
by: Prakash, Nikhil, et al.
Published: (2024)
Reasoning Models Know What's Important, and Encode It in Their Activations
by: Nikankin, Yaniv, et al.
Published: (2026)
by: Nikankin, Yaniv, et al.
Published: (2026)
A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry
by: Toker, Michael, et al.
Published: (2024)
by: Toker, Michael, et al.
Published: (2024)
Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness
by: Ashuach, Tomer, et al.
Published: (2026)
by: Ashuach, Tomer, et al.
Published: (2026)
Unsupervised Translation of Emergent Communication
by: Levy, Ido, et al.
Published: (2025)
by: Levy, Ido, et al.
Published: (2025)
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
by: Ashuach, Tomer, et al.
Published: (2025)
by: Ashuach, Tomer, et al.
Published: (2025)
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
by: Toker, Michael, et al.
Published: (2024)
by: Toker, Michael, et al.
Published: (2024)
Growing a Tail: Increasing Output Diversity in Large Language Models
by: Shur-Ofry, Michal, et al.
Published: (2024)
by: Shur-Ofry, Michal, et al.
Published: (2024)
Position-aware Automatic Circuit Discovery
by: Haklay, Tal, et al.
Published: (2025)
by: Haklay, Tal, et al.
Published: (2025)
Similar Items
-
Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods
by: Blau, Tsachi, et al.
Published: (2024) -
ContraSim -- Analyzing Neural Representations Based on Contrastive Learning
by: Rahamim, Adir, et al.
Published: (2023) -
Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024) -
MATCH: Task-Driven Code Evaluation through Contrastive Learning
by: Ghoummaid, Marah, et al.
Published: (2025) -
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
by: Yu, Zeping, et al.
Published: (2025)