Saved in:
| Main Authors: | Wei, Dennis, Padhi, Inkit, Ghosh, Soumya, Dhurandhar, Amit, Ramamurthy, Karthikeyan Natesan, Chang, Maria |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.03906 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Programming Refusal with Conditional Activation Steering
by: Lee, Bruce W., et al.
Published: (2024)
by: Lee, Bruce W., et al.
Published: (2024)
Identifying Sub-networks in Neural Networks via Functionally Similar Representations
by: Gao, Tian, et al.
Published: (2024)
by: Gao, Tian, et al.
Published: (2024)
Trust Regions for Explanations via Black-Box Probabilistic Certification
by: Dhurandhar, Amit, et al.
Published: (2024)
by: Dhurandhar, Amit, et al.
Published: (2024)
Value Alignment from Unstructured Text
by: Padhi, Inkit, et al.
Published: (2024)
by: Padhi, Inkit, et al.
Published: (2024)
Ranking Large Language Models without Ground Truth
by: Dhurandhar, Amit, et al.
Published: (2024)
by: Dhurandhar, Amit, et al.
Published: (2024)
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs
by: Kadhe, Swanand Ravindra, et al.
Published: (2024)
by: Kadhe, Swanand Ravindra, et al.
Published: (2024)
Large Language Model Confidence Estimation via Black-Box Access
by: Pedapati, Tejaswini, et al.
Published: (2024)
by: Pedapati, Tejaswini, et al.
Published: (2024)
Active Sequential Two-Sample Testing
by: Li, Weizhi, et al.
Published: (2023)
by: Li, Weizhi, et al.
Published: (2023)
LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems
by: Asif, Sadia, et al.
Published: (2026)
by: Asif, Sadia, et al.
Published: (2026)
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
by: Nagireddy, Manish, et al.
Published: (2024)
by: Nagireddy, Manish, et al.
Published: (2024)
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
by: Achintalwar, Swapnaja, et al.
Published: (2024)
by: Achintalwar, Swapnaja, et al.
Published: (2024)
Multi-Level Explanations for Generative Language Models
by: Paes, Lucas Monteiro, et al.
Published: (2024)
by: Paes, Lucas Monteiro, et al.
Published: (2024)
AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows
by: Ngong, Ivoline C., et al.
Published: (2026)
by: Ngong, Ivoline C., et al.
Published: (2026)
ICX360: In-Context eXplainability 360 Toolkit
by: Wei, Dennis, et al.
Published: (2025)
by: Wei, Dennis, et al.
Published: (2025)
When Stability meets Sufficiency: Informative Explanations that do not Overwhelm
by: Luss, Ronny, et al.
Published: (2021)
by: Luss, Ronny, et al.
Published: (2021)
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
by: Ngong, Ivoline, et al.
Published: (2025)
by: Ngong, Ivoline, et al.
Published: (2025)
CELL your Model: Contrastive Explanations for Large Language Models
by: Luss, Ronny, et al.
Published: (2024)
by: Luss, Ronny, et al.
Published: (2024)
Reasoning about concepts with LLMs: Inconsistencies abound
by: Uceda-Sosa, Rosario, et al.
Published: (2024)
by: Uceda-Sosa, Rosario, et al.
Published: (2024)
CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions
by: Puri, Isha, et al.
Published: (2025)
by: Puri, Isha, et al.
Published: (2025)
Data Attribution in Adaptive Learning
by: Rege, Amit Kiran
Published: (2026)
by: Rege, Amit Kiran
Published: (2026)
Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations
by: Villa, Danielle, et al.
Published: (2025)
by: Villa, Danielle, et al.
Published: (2025)
Attribute Graphs Underlying Molecular Generative Models: Path to Learning with Limited Data
by: Hoffman, Samuel C., et al.
Published: (2022)
by: Hoffman, Samuel C., et al.
Published: (2022)
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
by: Hou, Yufang, et al.
Published: (2024)
by: Hou, Yufang, et al.
Published: (2024)
Bandits with Mean Bounds
by: Sharma, Nihal, et al.
Published: (2020)
by: Sharma, Nihal, et al.
Published: (2020)
Scalable Data Attribution via Forward-Only Test-Time Inference
by: Ma, Sibo, et al.
Published: (2025)
by: Ma, Sibo, et al.
Published: (2025)
Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge
by: Dognin, Pierre, et al.
Published: (2020)
by: Dognin, Pierre, et al.
Published: (2020)
ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior
by: Eichin, Florian, et al.
Published: (2025)
by: Eichin, Florian, et al.
Published: (2025)
Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios
by: Yoon, Sangyeon, et al.
Published: (2024)
by: Yoon, Sangyeon, et al.
Published: (2024)
Auditing and Generating Synthetic Data with Controllable Trust Trade-offs
by: Belgodere, Brian, et al.
Published: (2023)
by: Belgodere, Brian, et al.
Published: (2023)
Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
by: Sharma, Nihal, et al.
Published: (2021)
by: Sharma, Nihal, et al.
Published: (2021)
A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning
by: Wei, Ran, et al.
Published: (2023)
by: Wei, Ran, et al.
Published: (2023)
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
by: Dhurandhar, Amit, et al.
Published: (2024)
by: Dhurandhar, Amit, et al.
Published: (2024)
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
by: Huang, Yue, et al.
Published: (2025)
by: Huang, Yue, et al.
Published: (2025)
Integrated Gradient Correlation: a Dataset-wise Attribution Method
by: Lelièvre, Pierre, et al.
Published: (2024)
by: Lelièvre, Pierre, et al.
Published: (2024)
Constrained Centroid Clustering: A Novel Approach for Compact and Structured Partitioning
by: Veeramachaneni, Sowmini Devi, et al.
Published: (2025)
by: Veeramachaneni, Sowmini Devi, et al.
Published: (2025)
Better Models, Faster Training: Sigmoid Attention for single-cell Foundation Models
by: Sadashivaiah, Vijay, et al.
Published: (2026)
by: Sadashivaiah, Vijay, et al.
Published: (2026)
Optimizing TinyML: The Impact of Reduced Data Acquisition Rates for Time Series Classification on Microcontrollers
by: Samanta, Riya, et al.
Published: (2024)
by: Samanta, Riya, et al.
Published: (2024)
Imperfect Influence, Preserved Rankings: A Theory of TRAK for Data Attribution
by: Tong, Han, et al.
Published: (2026)
by: Tong, Han, et al.
Published: (2026)
Gradient Flow Based Phase-Field Modeling Using Separable Neural Networks
by: Mattey, Revanth, et al.
Published: (2024)
by: Mattey, Revanth, et al.
Published: (2024)
Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution
by: Covert, Ian, et al.
Published: (2024)
by: Covert, Ian, et al.
Published: (2024)
Similar Items
-
Programming Refusal with Conditional Activation Steering
by: Lee, Bruce W., et al.
Published: (2024) -
Identifying Sub-networks in Neural Networks via Functionally Similar Representations
by: Gao, Tian, et al.
Published: (2024) -
Trust Regions for Explanations via Black-Box Probabilistic Certification
by: Dhurandhar, Amit, et al.
Published: (2024) -
Value Alignment from Unstructured Text
by: Padhi, Inkit, et al.
Published: (2024) -
Ranking Large Language Models without Ground Truth
by: Dhurandhar, Amit, et al.
Published: (2024)