Saved in:
| Main Authors: | Khandelwal, Vedant, Rossi, Francesca, Murugesan, Keerthiram, Miehling, Erik, Campbell, Murray, Ramamurthy, Karthikeyan Natesan, Horesh, Lior |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.17959 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations
by: Villa, Danielle, et al.
Published: (2025)
by: Villa, Danielle, et al.
Published: (2025)
AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows
by: Ngong, Ivoline C., et al.
Published: (2026)
by: Ngong, Ivoline C., et al.
Published: (2026)
On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)
by: Pallagani, Vishal, et al.
Published: (2024)
by: Pallagani, Vishal, et al.
Published: (2024)
Evaluating the Prompt Steerability of Large Language Models
by: Miehling, Erik, et al.
Published: (2024)
by: Miehling, Erik, et al.
Published: (2024)
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
by: Ngong, Ivoline, et al.
Published: (2025)
by: Ngong, Ivoline, et al.
Published: (2025)
Mitigating Misalignment Contagion by Steering with Implicit Traits
by: Chang, Maria, et al.
Published: (2026)
by: Chang, Maria, et al.
Published: (2026)
Programming Refusal with Conditional Activation Steering
by: Lee, Bruce W., et al.
Published: (2024)
by: Lee, Bruce W., et al.
Published: (2024)
Reasoning about concepts with LLMs: Inconsistencies abound
by: Uceda-Sosa, Rosario, et al.
Published: (2024)
by: Uceda-Sosa, Rosario, et al.
Published: (2024)
Ranking Large Language Models without Ground Truth
by: Dhurandhar, Amit, et al.
Published: (2024)
by: Dhurandhar, Amit, et al.
Published: (2024)
The Effectiveness of Approximate Regularized Replay for Efficient Supervised Fine-Tuning of Large Language Models
by: Riemer, Matthew, et al.
Published: (2025)
by: Riemer, Matthew, et al.
Published: (2025)
Trust Regions for Explanations via Black-Box Probabilistic Certification
by: Dhurandhar, Amit, et al.
Published: (2024)
by: Dhurandhar, Amit, et al.
Published: (2024)
AI Steerability 360: A Toolkit for Steering Large Language Models
by: Miehling, Erik, et al.
Published: (2026)
by: Miehling, Erik, et al.
Published: (2026)
EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning
by: Basu, Kinjal, et al.
Published: (2024)
by: Basu, Kinjal, et al.
Published: (2024)
STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models
by: Basavatia, Shreyas, et al.
Published: (2024)
by: Basavatia, Shreyas, et al.
Published: (2024)
A Neurosymbolic Fast and Slow Architecture for Graph Coloring
by: Khandelwal, Vedant, et al.
Published: (2024)
by: Khandelwal, Vedant, et al.
Published: (2024)
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
by: Mukherjee, Arpan, et al.
Published: (2024)
by: Mukherjee, Arpan, et al.
Published: (2024)
Towards Aligning Language Models with Textual Feedback
by: Lloret, Saüc Abadal, et al.
Published: (2024)
by: Lloret, Saüc Abadal, et al.
Published: (2024)
CELL your Model: Contrastive Explanations for Large Language Models
by: Luss, Ronny, et al.
Published: (2024)
by: Luss, Ronny, et al.
Published: (2024)
Quantifying artificial intelligence through algorithmic generalization
by: Ito, Takuya, et al.
Published: (2024)
by: Ito, Takuya, et al.
Published: (2024)
LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems
by: Asif, Sadia, et al.
Published: (2026)
by: Asif, Sadia, et al.
Published: (2026)
Agentic AI Needs a Systems Theory
by: Miehling, Erik, et al.
Published: (2025)
by: Miehling, Erik, et al.
Published: (2025)
Towards Learning Foundation Models for Heuristic Functions to Solve Pathfinding Problems
by: Khandelwal, Vedant, et al.
Published: (2024)
by: Khandelwal, Vedant, et al.
Published: (2024)
Multi-Level Explanations for Generative Language Models
by: Paes, Lucas Monteiro, et al.
Published: (2024)
by: Paes, Lucas Monteiro, et al.
Published: (2024)
Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models
by: Arif, Huzaifa, et al.
Published: (2025)
by: Arif, Huzaifa, et al.
Published: (2025)
Interpretable Graph-Language Modeling for Detecting Youth Illicit Drug Use
by: Li, Yiyang, et al.
Published: (2025)
by: Li, Yiyang, et al.
Published: (2025)
Context Attribution with Multi-Armed Bandit Optimization
by: Pan, Deng, et al.
Published: (2025)
by: Pan, Deng, et al.
Published: (2025)
PDDLFuse: A Tool for Generating Diverse Planning Domains
by: Khandelwal, Vedant, et al.
Published: (2024)
by: Khandelwal, Vedant, et al.
Published: (2024)
CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design
by: Neehal, Nafis, et al.
Published: (2024)
by: Neehal, Nafis, et al.
Published: (2024)
The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning
by: Shah, Raj Sanjay, et al.
Published: (2026)
by: Shah, Raj Sanjay, et al.
Published: (2026)
Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models
by: Elenjical, Abraham Paul, et al.
Published: (2026)
by: Elenjical, Abraham Paul, et al.
Published: (2026)
Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
by: Miehling, Erik, et al.
Published: (2024)
by: Miehling, Erik, et al.
Published: (2024)
Monitor-Generate-Verify (MGV): Formalising Metacognitive Theory for Language Model Reasoning
by: Oh, Nick, et al.
Published: (2025)
by: Oh, Nick, et al.
Published: (2025)
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
by: Rios, Jesus, et al.
Published: (2025)
by: Rios, Jesus, et al.
Published: (2025)
CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models
by: Padwal, Vedant
Published: (2026)
by: Padwal, Vedant
Published: (2026)
Who Sees the Risk? Stakeholder Conflicts and Explanatory Policies in LLM-based Risk Assessment
by: Yadav, Srishti, et al.
Published: (2025)
by: Yadav, Srishti, et al.
Published: (2025)
Transcendence: Generative Models Can Outperform The Experts That Train Them
by: Zhang, Edwin, et al.
Published: (2024)
by: Zhang, Edwin, et al.
Published: (2024)
ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval
by: Yang, David H., et al.
Published: (2026)
by: Yang, David H., et al.
Published: (2026)
The Need for Verification in AI-Driven Scientific Discovery
by: Cornelio, Cristina, et al.
Published: (2025)
by: Cornelio, Cristina, et al.
Published: (2025)
Meta-R1: Empowering Large Reasoning Models with Metacognition
by: Dong, Haonan, et al.
Published: (2025)
by: Dong, Haonan, et al.
Published: (2025)
Efficacy of Various Large Language Models in Generating Smart Contracts
by: Chatterjee, Siddhartha, et al.
Published: (2024)
by: Chatterjee, Siddhartha, et al.
Published: (2024)
Similar Items
-
Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations
by: Villa, Danielle, et al.
Published: (2025) -
AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows
by: Ngong, Ivoline C., et al.
Published: (2026) -
On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)
by: Pallagani, Vishal, et al.
Published: (2024) -
Evaluating the Prompt Steerability of Large Language Models
by: Miehling, Erik, et al.
Published: (2024) -
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
by: Ngong, Ivoline, et al.
Published: (2025)