Saved in:
| Main Authors: | Anupam, Sagnik, Brown, Davis, Li, Shuo, Wong, Eric, Hassani, Hamed, Bastani, Osbert |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.02418 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Effective Reinforcement Learning for Reasoning in Language Models
by: Huang, Lianghuan, et al.
Published: (2025)
by: Huang, Lianghuan, et al.
Published: (2025)
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024)
by: Huang, Xinmeng, et al.
Published: (2024)
LLM Program Optimization via Retrieval Augmented Search
by: Anupam, Sagnik, et al.
Published: (2025)
by: Anupam, Sagnik, et al.
Published: (2025)
Conformal Constrained Policy Optimization for Cost-Effective LLM Agents
by: Si, Wenwen, et al.
Published: (2025)
by: Si, Wenwen, et al.
Published: (2025)
Uncertainty in Language Models: Assessment through Rank-Calibration
by: Huang, Xinmeng, et al.
Published: (2024)
by: Huang, Xinmeng, et al.
Published: (2024)
RAPID: An Efficient Reinforcement Learning Algorithm for Small Language Models
by: Huang, Lianghuan, et al.
Published: (2025)
by: Huang, Lianghuan, et al.
Published: (2025)
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
by: Robey, Alexander, et al.
Published: (2023)
by: Robey, Alexander, et al.
Published: (2023)
Detecting Safety Violations Across Many Agent Traces
by: Stein, Adam, et al.
Published: (2026)
by: Stein, Adam, et al.
Published: (2026)
Adaptively profiling models with task elicitation
by: Brown, Davis, et al.
Published: (2025)
by: Brown, Davis, et al.
Published: (2025)
Diversity By Design: Leveraging Distribution Matching for Offline Model-Based Optimization
by: Yao, Michael S., et al.
Published: (2025)
by: Yao, Michael S., et al.
Published: (2025)
A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions
by: Mell, Stephen, et al.
Published: (2025)
by: Mell, Stephen, et al.
Published: (2025)
Are AI Capabilities Increasing Exponentially? A Competing Hypothesis
by: Ge, Haosen, et al.
Published: (2026)
by: Ge, Haosen, et al.
Published: (2026)
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks
by: Miyai, Atsuyuki, et al.
Published: (2025)
by: Miyai, Atsuyuki, et al.
Published: (2025)
Evaluating the Diversity and Quality of LLM Generated Content
by: Shypula, Alexander, et al.
Published: (2025)
by: Shypula, Alexander, et al.
Published: (2025)
Generative Adversarial Model-Based Optimization via Source Critic Regularization
by: Yao, Michael S., et al.
Published: (2024)
by: Yao, Michael S., et al.
Published: (2024)
The BrowserGym Ecosystem for Web Agent Research
by: De Chezelles, Thibault Le Sellier, et al.
Published: (2024)
by: De Chezelles, Thibault Le Sellier, et al.
Published: (2024)
TRAQ: Trustworthy Retrieval Augmented Question Answering via Conformal Prediction
by: Li, Shuo, et al.
Published: (2023)
by: Li, Shuo, et al.
Published: (2023)
WebLLM: A High-Performance In-Browser LLM Inference Engine
by: Ruan, Charlie F., et al.
Published: (2024)
by: Ruan, Charlie F., et al.
Published: (2024)
WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation
by: Liu, Zhichao, et al.
Published: (2026)
by: Liu, Zhichao, et al.
Published: (2026)
SafeArena: Evaluating the Safety of Autonomous Web Agents
by: Tur, Ada Defne, et al.
Published: (2025)
by: Tur, Ada Defne, et al.
Published: (2025)
Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning
by: Si, Wenwen, et al.
Published: (2026)
by: Si, Wenwen, et al.
Published: (2026)
Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine
by: Yao, Michael S., et al.
Published: (2025)
by: Yao, Michael S., et al.
Published: (2025)
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
by: Drouin, Alexandre, et al.
Published: (2024)
by: Drouin, Alexandre, et al.
Published: (2024)
DrEureka: Language Model Guided Sim-To-Real Transfer
by: Ma, Yecheng Jason, et al.
Published: (2024)
by: Ma, Yecheng Jason, et al.
Published: (2024)
Evaluating the Performance of Large Language Models via Debates
by: Moniri, Behrad, et al.
Published: (2024)
by: Moniri, Behrad, et al.
Published: (2024)
Jailbreaking Black Box Large Language Models in Twenty Queries
by: Chao, Patrick, et al.
Published: (2023)
by: Chao, Patrick, et al.
Published: (2023)
Asymptotic Normality of Generalized Low-Rank Matrix Sensing via Riemannian Geometry
by: Bastani, Osbert
Published: (2024)
by: Bastani, Osbert
Published: (2024)
Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents
by: Kiyani, Shayan, et al.
Published: (2025)
by: Kiyani, Shayan, et al.
Published: (2025)
Decaf: Improving Neural Decompilation with Automatic Feedback and Search
by: Shypula, Alexander, et al.
Published: (2026)
by: Shypula, Alexander, et al.
Published: (2026)
SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents
by: Kutasov, Jonathan, et al.
Published: (2025)
by: Kutasov, Jonathan, et al.
Published: (2025)
Eurekaverse: Environment Curriculum Generation via Large Language Models
by: Liang, William, et al.
Published: (2024)
by: Liang, William, et al.
Published: (2024)
WebArena: A Realistic Web Environment for Building Autonomous Agents
by: Zhou, Shuyan, et al.
Published: (2023)
by: Zhou, Shuyan, et al.
Published: (2023)
Conformal Prediction with Learned Features
by: Kiyani, Shayan, et al.
Published: (2024)
by: Kiyani, Shayan, et al.
Published: (2024)
Conformal Structured Prediction
by: Zhang, Botong, et al.
Published: (2024)
by: Zhang, Botong, et al.
Published: (2024)
Benchmarking Misuse Mitigation Against Covert Adversaries
by: Brown, Davis, et al.
Published: (2025)
by: Brown, Davis, et al.
Published: (2025)
Length Optimization in Conformal Prediction
by: Kiyani, Shayan, et al.
Published: (2024)
by: Kiyani, Shayan, et al.
Published: (2024)
Cross-Modality Investigation on WESAD Stress Classification
by: Oliver, Eric, et al.
Published: (2025)
by: Oliver, Eric, et al.
Published: (2025)
LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks
by: Long, Xiang, et al.
Published: (2026)
by: Long, Xiang, et al.
Published: (2026)
WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks
by: Ramesh, Guruprasad Viswanathan, et al.
Published: (2026)
by: Ramesh, Guruprasad Viswanathan, et al.
Published: (2026)
Evaluating Long-Context Reasoning in LLM-Based WebAgents
by: Chung, Andy, et al.
Published: (2025)
by: Chung, Andy, et al.
Published: (2025)
Similar Items
-
Effective Reinforcement Learning for Reasoning in Language Models
by: Huang, Lianghuan, et al.
Published: (2025) -
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024) -
LLM Program Optimization via Retrieval Augmented Search
by: Anupam, Sagnik, et al.
Published: (2025) -
Conformal Constrained Policy Optimization for Cost-Effective LLM Agents
by: Si, Wenwen, et al.
Published: (2025) -
Uncertainty in Language Models: Assessment through Rank-Calibration
by: Huang, Xinmeng, et al.
Published: (2024)