:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Luo, Haoyan, Zarlenga, Mateo Espinosa, Jamnik, Mateja
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2605.06342
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Digging Deeper: Learning Multi-Level Concept Hierarchies
by: Hill, Oscar, et al.
Published: (2026)

Hierarchical Concept-based Interpretable Models
by: Hill, Oscar, et al.
Published: (2026)

Understanding Inter-Concept Relationships in Concept-Based Models
by: Raman, Naveen, et al.
Published: (2024)

Foundations of Interpretable Models
by: Barbiero, Pietro, et al.
Published: (2025)

Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts
by: Zarlenga, Mateo Espinosa, et al.
Published: (2025)

Do Concept Bottleneck Models Respect Localities?
by: Raman, Naveen, et al.
Published: (2024)

Actionable Interpretability Must Be Defined in Terms of Symmetries
by: Barbiero, Pietro, et al.
Published: (2026)

Efficient Bias Mitigation Without Privileged Information
by: Zarlenga, Mateo Espinosa, et al.
Published: (2024)

Learning to Receive Help: Intervention-Aware Concept Embedding Models
by: Zarlenga, Mateo Espinosa, et al.
Published: (2023)

End-to-End Ontology Learning with Large Language Models
by: Lo, Andy, et al.
Published: (2024)

Don't Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs
by: Zhao, Ziyi, et al.
Published: (2026)

Interpretable Neural-Symbolic Concept Reasoning
by: Barbiero, Pietro, et al.
Published: (2023)

Don't Pay Attention
by: Hammoud, Mohammad, et al.
Published: (2025)

Don't Touch My Diacritics
by: Gorman, Kyle, et al.
Published: (2024)

Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning
by: He, Yongquan, et al.
Published: (2024)

If You Don't Understand It, Don't Use It: Eliminating Trojans with Filters Between Layers
by: Hernandez, Adriano
Published: (2024)

Don't Throw Away Your Pretrained Model
by: Feng, Shangbin, et al.
Published: (2025)

Don't Say No: Jailbreaking LLM by Suppressing Refusal
by: Zhou, Yukai, et al.
Published: (2024)

Think, But Don't Overthink: Reproducing Recursive Language Models
by: Wang, Daren
Published: (2026)

Hatevolution: What Static Benchmarks Don't Tell Us
by: Di Bonaventura, Chiara, et al.
Published: (2025)

Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank
by: Roy, Debjyoti Saha, et al.
Published: (2024)

ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models
by: Anand, Nikhil, et al.
Published: (2026)

Reasoning Models Reason Well, Until They Don't
by: Rameshkumar, Revanth, et al.
Published: (2025)

Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
by: Feng, Shangbin, et al.
Published: (2024)

Don't Throw Away Data: Better Sequence Knowledge Distillation
by: Wang, Jun, et al.
Published: (2024)

Don't Command, Cultivate: An Exploratory Study of System-2 Alignment
by: Wang, Yuhang, et al.
Published: (2024)

From Understanding to Utilization: A Survey on Explainability for Large Language Models
by: Luo, Haoyan, et al.
Published: (2024)

Tuning Language Models by Mixture-of-Depths Ensemble
by: Luo, Haoyan, et al.
Published: (2024)

Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning
by: Qin, Yuehan, et al.
Published: (2025)

Language Models Don't Learn the Physical Manifestation of Language
by: Lee, Bruce W., et al.
Published: (2024)

Don't Walk the Line: Boundary Guidance for Filtered Generation
by: Ball, Sarah, et al.
Published: (2025)

Can AI Assistants Know What They Don't Know?
by: Cheng, Qinyuan, et al.
Published: (2024)

Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction
by: Zhang, Yuzhe, et al.
Published: (2026)

Frictional Agent Alignment Framework: Slow Down and Don't Break Things
by: Nath, Abhijnan, et al.
Published: (2025)

Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That!
by: Bafna, Niyati, et al.
Published: (2024)

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models
by: Parmar, Jupinder, et al.
Published: (2024)

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
by: Hans, Abhimanyu, et al.
Published: (2024)

Why Don't Prompt-Based Fairness Metrics Correlate?
by: Zayed, Abdelrahman, et al.
Published: (2024)

ROAST: Rollout-based On-distribution Activation Steering Technique
by: Su, Xuanbo, et al.
Published: (2026)

Reasoning Models Don't Just Think Longer, They Move Differently
by: Gjølbye, Anders, et al.
Published: (2026)