:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khandelwal, Vedant, Rossi, Francesca, Murugesan, Keerthiram, Miehling, Erik, Campbell, Murray, Ramamurthy, Karthikeyan Natesan, Horesh, Lior
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.17959
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations
by: Villa, Danielle, et al.
Published: (2025)

AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows
by: Ngong, Ivoline C., et al.
Published: (2026)

On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)
by: Pallagani, Vishal, et al.
Published: (2024)

Evaluating the Prompt Steerability of Large Language Models
by: Miehling, Erik, et al.
Published: (2024)

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
by: Ngong, Ivoline, et al.
Published: (2025)

Mitigating Misalignment Contagion by Steering with Implicit Traits
by: Chang, Maria, et al.
Published: (2026)

Programming Refusal with Conditional Activation Steering
by: Lee, Bruce W., et al.
Published: (2024)

Reasoning about concepts with LLMs: Inconsistencies abound
by: Uceda-Sosa, Rosario, et al.
Published: (2024)

Ranking Large Language Models without Ground Truth
by: Dhurandhar, Amit, et al.
Published: (2024)

The Effectiveness of Approximate Regularized Replay for Efficient Supervised Fine-Tuning of Large Language Models
by: Riemer, Matthew, et al.
Published: (2025)

Trust Regions for Explanations via Black-Box Probabilistic Certification
by: Dhurandhar, Amit, et al.
Published: (2024)

AI Steerability 360: A Toolkit for Steering Large Language Models
by: Miehling, Erik, et al.
Published: (2026)

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning
by: Basu, Kinjal, et al.
Published: (2024)

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models
by: Basavatia, Shreyas, et al.
Published: (2024)

A Neurosymbolic Fast and Slow Architecture for Graph Coloring
by: Khandelwal, Vedant, et al.
Published: (2024)

Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
by: Mukherjee, Arpan, et al.
Published: (2024)

Towards Aligning Language Models with Textual Feedback
by: Lloret, Saüc Abadal, et al.
Published: (2024)

CELL your Model: Contrastive Explanations for Large Language Models
by: Luss, Ronny, et al.
Published: (2024)

Quantifying artificial intelligence through algorithmic generalization
by: Ito, Takuya, et al.
Published: (2024)

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems
by: Asif, Sadia, et al.
Published: (2026)

Agentic AI Needs a Systems Theory
by: Miehling, Erik, et al.
Published: (2025)

Towards Learning Foundation Models for Heuristic Functions to Solve Pathfinding Problems
by: Khandelwal, Vedant, et al.
Published: (2024)

Multi-Level Explanations for Generative Language Models
by: Paes, Lucas Monteiro, et al.
Published: (2024)

Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models
by: Arif, Huzaifa, et al.
Published: (2025)

Interpretable Graph-Language Modeling for Detecting Youth Illicit Drug Use
by: Li, Yiyang, et al.
Published: (2025)

Context Attribution with Multi-Armed Bandit Optimization
by: Pan, Deng, et al.
Published: (2025)

PDDLFuse: A Tool for Generating Diverse Planning Domains
by: Khandelwal, Vedant, et al.
Published: (2024)

CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design
by: Neehal, Nafis, et al.
Published: (2024)

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning
by: Shah, Raj Sanjay, et al.
Published: (2026)

Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models
by: Elenjical, Abraham Paul, et al.
Published: (2026)

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions
by: Miehling, Erik, et al.
Published: (2024)

Monitor-Generate-Verify (MGV): Formalising Metacognitive Theory for Language Model Reasoning
by: Oh, Nick, et al.
Published: (2025)

Sparsity May Be All You Need: Sparse Random Parameter Adaptation
by: Rios, Jesus, et al.
Published: (2025)

CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models
by: Padwal, Vedant
Published: (2026)

Who Sees the Risk? Stakeholder Conflicts and Explanatory Policies in LLM-based Risk Assessment
by: Yadav, Srishti, et al.
Published: (2025)

Transcendence: Generative Models Can Outperform The Experts That Train Them
by: Zhang, Edwin, et al.
Published: (2024)

ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval
by: Yang, David H., et al.
Published: (2026)

The Need for Verification in AI-Driven Scientific Discovery
by: Cornelio, Cristina, et al.
Published: (2025)

Meta-R1: Empowering Large Reasoning Models with Metacognition
by: Dong, Haonan, et al.
Published: (2025)

Efficacy of Various Large Language Models in Generating Smart Contracts
by: Chatterjee, Siddhartha, et al.
Published: (2024)