:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Anupam, Sagnik, Brown, Davis, Li, Shuo, Wong, Eric, Hassani, Hamed, Bastani, Osbert
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2510.02418
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Effective Reinforcement Learning for Reasoning in Language Models
by: Huang, Lianghuan, et al.
Published: (2025)

One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024)

LLM Program Optimization via Retrieval Augmented Search
by: Anupam, Sagnik, et al.
Published: (2025)

Conformal Constrained Policy Optimization for Cost-Effective LLM Agents
by: Si, Wenwen, et al.
Published: (2025)

Uncertainty in Language Models: Assessment through Rank-Calibration
by: Huang, Xinmeng, et al.
Published: (2024)

RAPID: An Efficient Reinforcement Learning Algorithm for Small Language Models
by: Huang, Lianghuan, et al.
Published: (2025)

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
by: Robey, Alexander, et al.
Published: (2023)

Detecting Safety Violations Across Many Agent Traces
by: Stein, Adam, et al.
Published: (2026)

Adaptively profiling models with task elicitation
by: Brown, Davis, et al.
Published: (2025)

Diversity By Design: Leveraging Distribution Matching for Offline Model-Based Optimization
by: Yao, Michael S., et al.
Published: (2025)

A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions
by: Mell, Stephen, et al.
Published: (2025)

Are AI Capabilities Increasing Exponentially? A Competing Hypothesis
by: Ge, Haosen, et al.
Published: (2026)

WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks
by: Miyai, Atsuyuki, et al.
Published: (2025)

Evaluating the Diversity and Quality of LLM Generated Content
by: Shypula, Alexander, et al.
Published: (2025)

Generative Adversarial Model-Based Optimization via Source Critic Regularization
by: Yao, Michael S., et al.
Published: (2024)

The BrowserGym Ecosystem for Web Agent Research
by: De Chezelles, Thibault Le Sellier, et al.
Published: (2024)

TRAQ: Trustworthy Retrieval Augmented Question Answering via Conformal Prediction
by: Li, Shuo, et al.
Published: (2023)

WebLLM: A High-Performance In-Browser LLM Inference Engine
by: Ruan, Charlie F., et al.
Published: (2024)

WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation
by: Liu, Zhichao, et al.
Published: (2026)

SafeArena: Evaluating the Safety of Autonomous Web Agents
by: Tur, Ada Defne, et al.
Published: (2025)

Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning
by: Si, Wenwen, et al.
Published: (2026)

Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine
by: Yao, Michael S., et al.
Published: (2025)

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
by: Drouin, Alexandre, et al.
Published: (2024)

DrEureka: Language Model Guided Sim-To-Real Transfer
by: Ma, Yecheng Jason, et al.
Published: (2024)

Evaluating the Performance of Large Language Models via Debates
by: Moniri, Behrad, et al.
Published: (2024)

Jailbreaking Black Box Large Language Models in Twenty Queries
by: Chao, Patrick, et al.
Published: (2023)

Asymptotic Normality of Generalized Low-Rank Matrix Sensing via Riemannian Geometry
by: Bastani, Osbert
Published: (2024)

Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents
by: Kiyani, Shayan, et al.
Published: (2025)

Decaf: Improving Neural Decompilation with Automatic Feedback and Search
by: Shypula, Alexander, et al.
Published: (2026)

SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents
by: Kutasov, Jonathan, et al.
Published: (2025)

Eurekaverse: Environment Curriculum Generation via Large Language Models
by: Liang, William, et al.
Published: (2024)

WebArena: A Realistic Web Environment for Building Autonomous Agents
by: Zhou, Shuyan, et al.
Published: (2023)

Conformal Prediction with Learned Features
by: Kiyani, Shayan, et al.
Published: (2024)

Conformal Structured Prediction
by: Zhang, Botong, et al.
Published: (2024)

Benchmarking Misuse Mitigation Against Covert Adversaries
by: Brown, Davis, et al.
Published: (2025)

Length Optimization in Conformal Prediction
by: Kiyani, Shayan, et al.
Published: (2024)

Cross-Modality Investigation on WESAD Stress Classification
by: Oliver, Eric, et al.
Published: (2025)

LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks
by: Long, Xiang, et al.
Published: (2026)

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks
by: Ramesh, Guruprasad Viswanathan, et al.
Published: (2026)

Evaluating Long-Context Reasoning in LLM-Based WebAgents
by: Chung, Andy, et al.
Published: (2025)