:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wei, Dennis, Padhi, Inkit, Ghosh, Soumya, Dhurandhar, Amit, Ramamurthy, Karthikeyan Natesan, Chang, Maria
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2412.03906
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Programming Refusal with Conditional Activation Steering
by: Lee, Bruce W., et al.
Published: (2024)

Identifying Sub-networks in Neural Networks via Functionally Similar Representations
by: Gao, Tian, et al.
Published: (2024)

Trust Regions for Explanations via Black-Box Probabilistic Certification
by: Dhurandhar, Amit, et al.
Published: (2024)

Value Alignment from Unstructured Text
by: Padhi, Inkit, et al.
Published: (2024)

Ranking Large Language Models without Ground Truth
by: Dhurandhar, Amit, et al.
Published: (2024)

Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs
by: Kadhe, Swanand Ravindra, et al.
Published: (2024)

Large Language Model Confidence Estimation via Black-Box Access
by: Pedapati, Tejaswini, et al.
Published: (2024)

Active Sequential Two-Sample Testing
by: Li, Weizhi, et al.
Published: (2023)

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems
by: Asif, Sadia, et al.
Published: (2026)

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
by: Nagireddy, Manish, et al.
Published: (2024)

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
by: Achintalwar, Swapnaja, et al.
Published: (2024)

Multi-Level Explanations for Generative Language Models
by: Paes, Lucas Monteiro, et al.
Published: (2024)

AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows
by: Ngong, Ivoline C., et al.
Published: (2026)

ICX360: In-Context eXplainability 360 Toolkit
by: Wei, Dennis, et al.
Published: (2025)

When Stability meets Sufficiency: Informative Explanations that do not Overwhelm
by: Luss, Ronny, et al.
Published: (2021)

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
by: Ngong, Ivoline, et al.
Published: (2025)

CELL your Model: Contrastive Explanations for Large Language Models
by: Luss, Ronny, et al.
Published: (2024)

Reasoning about concepts with LLMs: Inconsistencies abound
by: Uceda-Sosa, Rosario, et al.
Published: (2024)

CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions
by: Puri, Isha, et al.
Published: (2025)

Data Attribution in Adaptive Learning
by: Rege, Amit Kiran
Published: (2026)

Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations
by: Villa, Danielle, et al.
Published: (2025)

Attribute Graphs Underlying Molecular Generative Models: Path to Learning with Limited Data
by: Hoffman, Samuel C., et al.
Published: (2022)

WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
by: Hou, Yufang, et al.
Published: (2024)

Bandits with Mean Bounds
by: Sharma, Nihal, et al.
Published: (2020)

Scalable Data Attribution via Forward-Only Test-Time Inference
by: Ma, Sibo, et al.
Published: (2025)

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge
by: Dognin, Pierre, et al.
Published: (2020)

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior
by: Eichin, Florian, et al.
Published: (2025)

Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios
by: Yoon, Sangyeon, et al.
Published: (2024)

Auditing and Generating Synthetic Data with Controllable Trust Trade-offs
by: Belgodere, Brian, et al.
Published: (2023)

Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
by: Sharma, Nihal, et al.
Published: (2021)

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning
by: Wei, Ran, et al.
Published: (2023)

NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
by: Dhurandhar, Amit, et al.
Published: (2024)

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
by: Huang, Yue, et al.
Published: (2025)

Integrated Gradient Correlation: a Dataset-wise Attribution Method
by: Lelièvre, Pierre, et al.
Published: (2024)

Constrained Centroid Clustering: A Novel Approach for Compact and Structured Partitioning
by: Veeramachaneni, Sowmini Devi, et al.
Published: (2025)

Better Models, Faster Training: Sigmoid Attention for single-cell Foundation Models
by: Sadashivaiah, Vijay, et al.
Published: (2026)

Optimizing TinyML: The Impact of Reduced Data Acquisition Rates for Time Series Classification on Microcontrollers
by: Samanta, Riya, et al.
Published: (2024)

Imperfect Influence, Preserved Rankings: A Theory of TRAK for Data Attribution
by: Tong, Han, et al.
Published: (2026)

Gradient Flow Based Phase-Field Modeling Using Separable Neural Networks
by: Mattey, Revanth, et al.
Published: (2024)

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution
by: Covert, Ian, et al.
Published: (2024)