Saved in:
| Main Authors: | Luo, Haoyan, Zarlenga, Mateo Espinosa, Jamnik, Mateja |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.06342 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Digging Deeper: Learning Multi-Level Concept Hierarchies
by: Hill, Oscar, et al.
Published: (2026)
by: Hill, Oscar, et al.
Published: (2026)
Hierarchical Concept-based Interpretable Models
by: Hill, Oscar, et al.
Published: (2026)
by: Hill, Oscar, et al.
Published: (2026)
Understanding Inter-Concept Relationships in Concept-Based Models
by: Raman, Naveen, et al.
Published: (2024)
by: Raman, Naveen, et al.
Published: (2024)
Foundations of Interpretable Models
by: Barbiero, Pietro, et al.
Published: (2025)
by: Barbiero, Pietro, et al.
Published: (2025)
Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts
by: Zarlenga, Mateo Espinosa, et al.
Published: (2025)
by: Zarlenga, Mateo Espinosa, et al.
Published: (2025)
Do Concept Bottleneck Models Respect Localities?
by: Raman, Naveen, et al.
Published: (2024)
by: Raman, Naveen, et al.
Published: (2024)
Actionable Interpretability Must Be Defined in Terms of Symmetries
by: Barbiero, Pietro, et al.
Published: (2026)
by: Barbiero, Pietro, et al.
Published: (2026)
Efficient Bias Mitigation Without Privileged Information
by: Zarlenga, Mateo Espinosa, et al.
Published: (2024)
by: Zarlenga, Mateo Espinosa, et al.
Published: (2024)
Learning to Receive Help: Intervention-Aware Concept Embedding Models
by: Zarlenga, Mateo Espinosa, et al.
Published: (2023)
by: Zarlenga, Mateo Espinosa, et al.
Published: (2023)
End-to-End Ontology Learning with Large Language Models
by: Lo, Andy, et al.
Published: (2024)
by: Lo, Andy, et al.
Published: (2024)
Don't Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs
by: Zhao, Ziyi, et al.
Published: (2026)
by: Zhao, Ziyi, et al.
Published: (2026)
Interpretable Neural-Symbolic Concept Reasoning
by: Barbiero, Pietro, et al.
Published: (2023)
by: Barbiero, Pietro, et al.
Published: (2023)
Don't Pay Attention
by: Hammoud, Mohammad, et al.
Published: (2025)
by: Hammoud, Mohammad, et al.
Published: (2025)
Don't Touch My Diacritics
by: Gorman, Kyle, et al.
Published: (2024)
by: Gorman, Kyle, et al.
Published: (2024)
Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning
by: He, Yongquan, et al.
Published: (2024)
by: He, Yongquan, et al.
Published: (2024)
If You Don't Understand It, Don't Use It: Eliminating Trojans with Filters Between Layers
by: Hernandez, Adriano
Published: (2024)
by: Hernandez, Adriano
Published: (2024)
Don't Throw Away Your Pretrained Model
by: Feng, Shangbin, et al.
Published: (2025)
by: Feng, Shangbin, et al.
Published: (2025)
Don't Say No: Jailbreaking LLM by Suppressing Refusal
by: Zhou, Yukai, et al.
Published: (2024)
by: Zhou, Yukai, et al.
Published: (2024)
Think, But Don't Overthink: Reproducing Recursive Language Models
by: Wang, Daren
Published: (2026)
by: Wang, Daren
Published: (2026)
Hatevolution: What Static Benchmarks Don't Tell Us
by: Di Bonaventura, Chiara, et al.
Published: (2025)
by: Di Bonaventura, Chiara, et al.
Published: (2025)
Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank
by: Roy, Debjyoti Saha, et al.
Published: (2024)
by: Roy, Debjyoti Saha, et al.
Published: (2024)
ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models
by: Anand, Nikhil, et al.
Published: (2026)
by: Anand, Nikhil, et al.
Published: (2026)
Reasoning Models Reason Well, Until They Don't
by: Rameshkumar, Revanth, et al.
Published: (2025)
by: Rameshkumar, Revanth, et al.
Published: (2025)
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
by: Feng, Shangbin, et al.
Published: (2024)
by: Feng, Shangbin, et al.
Published: (2024)
Don't Throw Away Data: Better Sequence Knowledge Distillation
by: Wang, Jun, et al.
Published: (2024)
by: Wang, Jun, et al.
Published: (2024)
Don't Command, Cultivate: An Exploratory Study of System-2 Alignment
by: Wang, Yuhang, et al.
Published: (2024)
by: Wang, Yuhang, et al.
Published: (2024)
From Understanding to Utilization: A Survey on Explainability for Large Language Models
by: Luo, Haoyan, et al.
Published: (2024)
by: Luo, Haoyan, et al.
Published: (2024)
Tuning Language Models by Mixture-of-Depths Ensemble
by: Luo, Haoyan, et al.
Published: (2024)
by: Luo, Haoyan, et al.
Published: (2024)
Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning
by: Qin, Yuehan, et al.
Published: (2025)
by: Qin, Yuehan, et al.
Published: (2025)
Language Models Don't Learn the Physical Manifestation of Language
by: Lee, Bruce W., et al.
Published: (2024)
by: Lee, Bruce W., et al.
Published: (2024)
Don't Walk the Line: Boundary Guidance for Filtered Generation
by: Ball, Sarah, et al.
Published: (2025)
by: Ball, Sarah, et al.
Published: (2025)
Can AI Assistants Know What They Don't Know?
by: Cheng, Qinyuan, et al.
Published: (2024)
by: Cheng, Qinyuan, et al.
Published: (2024)
Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction
by: Zhang, Yuzhe, et al.
Published: (2026)
by: Zhang, Yuzhe, et al.
Published: (2026)
Frictional Agent Alignment Framework: Slow Down and Don't Break Things
by: Nath, Abhijnan, et al.
Published: (2025)
by: Nath, Abhijnan, et al.
Published: (2025)
Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That!
by: Bafna, Niyati, et al.
Published: (2024)
by: Bafna, Niyati, et al.
Published: (2024)
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models
by: Parmar, Jupinder, et al.
Published: (2024)
by: Parmar, Jupinder, et al.
Published: (2024)
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
by: Hans, Abhimanyu, et al.
Published: (2024)
by: Hans, Abhimanyu, et al.
Published: (2024)
Why Don't Prompt-Based Fairness Metrics Correlate?
by: Zayed, Abdelrahman, et al.
Published: (2024)
by: Zayed, Abdelrahman, et al.
Published: (2024)
ROAST: Rollout-based On-distribution Activation Steering Technique
by: Su, Xuanbo, et al.
Published: (2026)
by: Su, Xuanbo, et al.
Published: (2026)
Reasoning Models Don't Just Think Longer, They Move Differently
by: Gjølbye, Anders, et al.
Published: (2026)
by: Gjølbye, Anders, et al.
Published: (2026)
Similar Items
-
Digging Deeper: Learning Multi-Level Concept Hierarchies
by: Hill, Oscar, et al.
Published: (2026) -
Hierarchical Concept-based Interpretable Models
by: Hill, Oscar, et al.
Published: (2026) -
Understanding Inter-Concept Relationships in Concept-Based Models
by: Raman, Naveen, et al.
Published: (2024) -
Foundations of Interpretable Models
by: Barbiero, Pietro, et al.
Published: (2025) -
Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts
by: Zarlenga, Mateo Espinosa, et al.
Published: (2025)