Saved in:
| Main Authors: | Mor, Alon, Belinkov, Yonatan, Kimelfeld, Benny |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.07991 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Database Views as Explanations for Relational Deep Learning
by: Rissaki, Agapi, et al.
Published: (2025)
by: Rissaki, Agapi, et al.
Published: (2025)
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)
by: Katz, Shahar, et al.
Published: (2024)
Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores
by: Badash, Zvi N., et al.
Published: (2026)
by: Badash, Zvi N., et al.
Published: (2026)
A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry
by: Toker, Michael, et al.
Published: (2024)
by: Toker, Michael, et al.
Published: (2024)
Measures of Information Reflect Memorization Patterns
by: Bansal, Rachit, et al.
Published: (2022)
by: Bansal, Rachit, et al.
Published: (2022)
Selecting Walk Schemes for Database Embedding
by: Lubarsky, Yuval Lev, et al.
Published: (2024)
by: Lubarsky, Yuval Lev, et al.
Published: (2024)
SAEs Are Good for Steering -- If You Select the Right Features
by: Arad, Dana, et al.
Published: (2025)
by: Arad, Dana, et al.
Published: (2025)
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
by: Iskander, Shadi, et al.
Published: (2024)
by: Iskander, Shadi, et al.
Published: (2024)
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
by: Itzhak, Itay, et al.
Published: (2025)
by: Itzhak, Itay, et al.
Published: (2025)
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
by: Hanna, Michael, et al.
Published: (2024)
by: Hanna, Michael, et al.
Published: (2024)
Decomposing Query-Key Feature Interactions Using Contrastive Covariances
by: Lee, Andrew, et al.
Published: (2026)
by: Lee, Andrew, et al.
Published: (2026)
Fast Forwarding Low-Rank Training
by: Rahamim, Adir, et al.
Published: (2024)
by: Rahamim, Adir, et al.
Published: (2024)
Incorporating Deep Learning Design in Database Queries
by: Lubarsky, Yuval Lev, et al.
Published: (2026)
by: Lubarsky, Yuval Lev, et al.
Published: (2026)
Tractability Frontiers of the Shapley Value for Aggregate Conjunctive Queries
by: Standke, Christoph, et al.
Published: (2025)
by: Standke, Christoph, et al.
Published: (2025)
Structured RAG for Answering Aggregative Questions
by: Koshorek, Omri, et al.
Published: (2025)
by: Koshorek, Omri, et al.
Published: (2025)
Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias
by: Itzhak, Itay, et al.
Published: (2023)
by: Itzhak, Itay, et al.
Published: (2023)
From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
by: Itzhak, Itay, et al.
Published: (2026)
by: Itzhak, Itay, et al.
Published: (2026)
Silent Tokens, Loud Effects: Padding in LLMs
by: Himelstein, Rom, et al.
Published: (2025)
by: Himelstein, Rom, et al.
Published: (2025)
Unified Concept Editing in Diffusion Models
by: Gandikota, Rohit, et al.
Published: (2023)
by: Gandikota, Rohit, et al.
Published: (2023)
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
by: Prakash, Nikhil, et al.
Published: (2024)
by: Prakash, Nikhil, et al.
Published: (2024)
Direct Access for Answers to Conjunctive Queries with Aggregation
by: Eldar, Idan, et al.
Published: (2023)
by: Eldar, Idan, et al.
Published: (2023)
The Complexity of Aggregates over Extractions by Regular Expressions
by: Doleschal, Johannes, et al.
Published: (2020)
by: Doleschal, Johannes, et al.
Published: (2020)
Expressive Power of Deep Homomorphism Networks over Relational Databases
by: Schönherr, Moritz, et al.
Published: (2026)
by: Schönherr, Moritz, et al.
Published: (2026)
Induction Meets Biology: Mechanisms of Repeat Detection in Protein Language Models
by: Pomerants, Gal, et al.
Published: (2026)
by: Pomerants, Gal, et al.
Published: (2026)
Position-aware Automatic Circuit Discovery
by: Haklay, Tal, et al.
Published: (2025)
by: Haklay, Tal, et al.
Published: (2025)
Mechanisms of AI Protein Folding in ESMFold
by: Lu, Kevin, et al.
Published: (2026)
by: Lu, Kevin, et al.
Published: (2026)
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
by: Marks, Samuel, et al.
Published: (2024)
by: Marks, Samuel, et al.
Published: (2024)
Regional Explanations: Bridging Local and Global Variable Importance
by: Amoukou, Salim I., et al.
Published: (2026)
by: Amoukou, Salim I., et al.
Published: (2026)
GLEAMS: Bridging the Gap Between Local and Global Explanations
by: Visani, Giorgio, et al.
Published: (2024)
by: Visani, Giorgio, et al.
Published: (2024)
Confidence Regulation Neurons in Language Models
by: Stolfo, Alessandro, et al.
Published: (2024)
by: Stolfo, Alessandro, et al.
Published: (2024)
Explaining Hypergraph Neural Networks: From Local Explanations to Global Concepts
by: Su, Shiye, et al.
Published: (2024)
by: Su, Shiye, et al.
Published: (2024)
How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
by: Buzaglo, Gon, et al.
Published: (2024)
by: Buzaglo, Gon, et al.
Published: (2024)
Alike Parts: A Feature-Informed Approach to Local and Global Prototype Explanations
by: Karolczak, Jacek, et al.
Published: (2026)
by: Karolczak, Jacek, et al.
Published: (2026)
Aggregate Models, Not Explanations: Improving Feature Importance Estimation
by: Paillard, Joseph, et al.
Published: (2026)
by: Paillard, Joseph, et al.
Published: (2026)
Using Database Dependencies to Constrain Approval-Based Committee Voting in the Presence of Context
by: Yona, Roi, et al.
Published: (2025)
by: Yona, Roi, et al.
Published: (2025)
L2GTX: From Local to Global Time Series Explanations
by: Mekonnen, Ephrem Tibebe, et al.
Published: (2026)
by: Mekonnen, Ephrem Tibebe, et al.
Published: (2026)
Guarantee Regions for Local Explanations
by: Havasi, Marton, et al.
Published: (2024)
by: Havasi, Marton, et al.
Published: (2024)
Auditing Local Explanations is Hard
by: Bhattacharjee, Robi, et al.
Published: (2024)
by: Bhattacharjee, Robi, et al.
Published: (2024)
Jamba: A Hybrid Transformer-Mamba Language Model
by: Lieber, Opher, et al.
Published: (2024)
by: Lieber, Opher, et al.
Published: (2024)
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
by: Ventura, Mor, et al.
Published: (2025)
by: Ventura, Mor, et al.
Published: (2025)
Similar Items
-
Database Views as Explanations for Relational Deep Learning
by: Rissaki, Agapi, et al.
Published: (2025) -
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024) -
Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores
by: Badash, Zvi N., et al.
Published: (2026) -
A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry
by: Toker, Michael, et al.
Published: (2024) -
Measures of Information Reflect Memorization Patterns
by: Bansal, Rachit, et al.
Published: (2022)