Saved in:
| Main Authors: | Oesterling, Alex, Ren, Donghao, Assogba, Yannick, Moritz, Dominik, Kim, Sunnie S. Y., Gatys, Leon, Hohman, Fred |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.05329 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments
by: Boggust, Angie, et al.
Published: (2024)
by: Boggust, Angie, et al.
Published: (2024)
Evaluating Long Range Dependency Handling in Code Generation LLMs
by: Assogba, Yannick, et al.
Published: (2024)
by: Assogba, Yannick, et al.
Published: (2024)
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
by: Yeh, Catherine, et al.
Published: (2024)
by: Yeh, Catherine, et al.
Published: (2024)
Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences
by: Hohman, Fred, et al.
Published: (2023)
by: Hohman, Fred, et al.
Published: (2023)
A Scalable Approach to Clustering Embedding Projections
by: Ren, Donghao, et al.
Published: (2025)
by: Ren, Donghao, et al.
Published: (2025)
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
by: Boggust, Angie, et al.
Published: (2025)
by: Boggust, Angie, et al.
Published: (2025)
Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment
by: Krishna, Kundan, et al.
Published: (2025)
by: Krishna, Kundan, et al.
Published: (2025)
Embedding Atlas: Low-Friction, Interactive Embedding Visualization
by: Ren, Donghao, et al.
Published: (2025)
by: Ren, Donghao, et al.
Published: (2025)
Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers
by: Oesterling, Alex, et al.
Published: (2024)
by: Oesterling, Alex, et al.
Published: (2024)
Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors
by: Lam, Michelle S., et al.
Published: (2024)
by: Lam, Michelle S., et al.
Published: (2024)
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
by: Bhalla, Usha, et al.
Published: (2025)
by: Bhalla, Usha, et al.
Published: (2025)
Fair Machine Unlearning: Data Removal while Mitigating Disparities
by: Oesterling, Alex, et al.
Published: (2023)
by: Oesterling, Alex, et al.
Published: (2023)
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
by: Palaskar, Shruti, et al.
Published: (2025)
by: Palaskar, Shruti, et al.
Published: (2025)
Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference
by: Hohman, Fred, et al.
Published: (2024)
by: Hohman, Fred, et al.
Published: (2024)
Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
by: Prakash, Nikhil, et al.
Published: (2025)
by: Prakash, Nikhil, et al.
Published: (2025)
EncQA: Benchmarking Vision-Language Models on Visual Encodings for Charts
by: Mukherjee, Kushin, et al.
Published: (2025)
by: Mukherjee, Kushin, et al.
Published: (2025)
PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming
by: Deng, Wesley Hanwen, et al.
Published: (2025)
by: Deng, Wesley Hanwen, et al.
Published: (2025)
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
by: Deng, Wesley Hanwen, et al.
Published: (2026)
by: Deng, Wesley Hanwen, et al.
Published: (2026)
Reinforcement Learning with $ω$-Regular Objectives and Constraints
by: Wagner, Dominik, et al.
Published: (2025)
by: Wagner, Dominik, et al.
Published: (2025)
Interpreting and Controlling LLM Reasoning through Integrated Policy Gradient
by: Li, Changming, et al.
Published: (2026)
by: Li, Changming, et al.
Published: (2026)
APRIL: Annotations for Policy evaluation with Reliable Inference from LLMs
by: Mandyam, Aishwarya, et al.
Published: (2025)
by: Mandyam, Aishwarya, et al.
Published: (2025)
Understanding and Preserving Safety in Fine-Tuned LLMs
by: Zhang, Jiawen, et al.
Published: (2026)
by: Zhang, Jiawen, et al.
Published: (2026)
Safe Transformer: An Explicit Safety Bit For Interpretable And Controllable Alignment
by: Feng, Jingyuan, et al.
Published: (2026)
by: Feng, Jingyuan, et al.
Published: (2026)
Counting Still Counts: Understanding Neural Complex Query Answering Through Query Relaxation
by: Brunink, Yannick, et al.
Published: (2025)
by: Brunink, Yannick, et al.
Published: (2025)
Interpretable Policy Distillation for Power Grid Topology Control
by: Dmitruka, Aleksandra, et al.
Published: (2026)
by: Dmitruka, Aleksandra, et al.
Published: (2026)
Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning
by: Kohler, Hector, et al.
Published: (2024)
by: Kohler, Hector, et al.
Published: (2024)
Pragmatic Policy Development via Interpretable Behavior Cloning
by: Matsson, Anton, et al.
Published: (2025)
by: Matsson, Anton, et al.
Published: (2025)
Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs
by: Kohler, Hector, et al.
Published: (2025)
by: Kohler, Hector, et al.
Published: (2025)
Multi-Group Proportional Representation in Retrieval
by: Oesterling, Alex, et al.
Published: (2024)
by: Oesterling, Alex, et al.
Published: (2024)
Annotation-Assisted Learning of Treatment Policies From Multimodal Electronic Health Records
by: Arno, Henri, et al.
Published: (2025)
by: Arno, Henri, et al.
Published: (2025)
An Analysis of Concept Bottleneck Models: Measuring, Understanding, and Mitigating the Impact of Noisy Annotations
by: Park, Seonghwan, et al.
Published: (2025)
by: Park, Seonghwan, et al.
Published: (2025)
Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies
by: Rietz, Finn, et al.
Published: (2024)
by: Rietz, Finn, et al.
Published: (2024)
Machine Learning for Climate Policy: Understanding Policy Progression in the European Green Deal
by: West, Patricia, et al.
Published: (2025)
by: West, Patricia, et al.
Published: (2025)
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
by: Li, Donghao, et al.
Published: (2026)
by: Li, Donghao, et al.
Published: (2026)
A Shared Low-Rank Adaptation Approach to Personalized RLHF
by: Liu, Renpu, et al.
Published: (2025)
by: Liu, Renpu, et al.
Published: (2025)
Interpretable Predictability-Based AI Text Detection: A Replication Study
by: Skurla, Adam, et al.
Published: (2026)
by: Skurla, Adam, et al.
Published: (2026)
Identifying Intervenable and Interpretable Features via Orthogonality Regularization
by: Miller, Moritz, et al.
Published: (2026)
by: Miller, Moritz, et al.
Published: (2026)
Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning
by: Pravetz, Thomas
Published: (2026)
by: Pravetz, Thomas
Published: (2026)
Three Pathways to Neurosymbolic Reinforcement Learning with Interpretable Model and Policy Networks
by: Graf, Peter, et al.
Published: (2024)
by: Graf, Peter, et al.
Published: (2024)
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
by: Weltevrede, Max, et al.
Published: (2025)
by: Weltevrede, Max, et al.
Published: (2025)
Similar Items
-
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments
by: Boggust, Angie, et al.
Published: (2024) -
Evaluating Long Range Dependency Handling in Code Generation LLMs
by: Assogba, Yannick, et al.
Published: (2024) -
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
by: Yeh, Catherine, et al.
Published: (2024) -
Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences
by: Hohman, Fred, et al.
Published: (2023) -
A Scalable Approach to Clustering Embedding Projections
by: Ren, Donghao, et al.
Published: (2025)