Saved in:
| Main Authors: | Koo, Ryan, Yang, Ian, Raheja, Vipul, Hong, Mingyi, Jun, Kwang-Sung, Kang, Dongyeop |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.16272 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Cognitive Biases in Large Language Models as Evaluators
by: Koo, Ryan, et al.
Published: (2023)
by: Koo, Ryan, et al.
Published: (2023)
The Amazing Agent Race: Strong Tool Users, Weak Navigators
by: Kim, Zae Myung, et al.
Published: (2026)
by: Kim, Zae Myung, et al.
Published: (2026)
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models
by: Kim, Zae Myung, et al.
Published: (2025)
by: Kim, Zae Myung, et al.
Published: (2025)
Scaling Unverifiable Rewards: A Case Study on Visual Insights
by: Gan, Shuyu, et al.
Published: (2025)
by: Gan, Shuyu, et al.
Published: (2025)
Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization
by: Jun, Kwang-Sung, et al.
Published: (2024)
by: Jun, Kwang-Sung, et al.
Published: (2024)
Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation
by: Mooney, James, et al.
Published: (2025)
by: Mooney, James, et al.
Published: (2025)
BAID: A Benchmark for Bias Assessment of AI Detectors
by: Basu, Priyam, et al.
Published: (2025)
by: Basu, Priyam, et al.
Published: (2025)
Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs
by: Kim, Zae Myung, et al.
Published: (2024)
by: Kim, Zae Myung, et al.
Published: (2024)
Learning a High-quality Robotic Wiping Policy Using Systematic Reward Analysis and Visual-Language Model Based Curriculum
by: Liu, Yihong, et al.
Published: (2025)
by: Liu, Yihong, et al.
Published: (2025)
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards
by: Qin, Hao, et al.
Published: (2023)
by: Qin, Hao, et al.
Published: (2023)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
by: Wei, Quan, et al.
Published: (2025)
by: Wei, Quan, et al.
Published: (2025)
Second-Order Bounds for [0,1]-Valued Regression via Betting Loss
by: Li, Yinan, et al.
Published: (2025)
by: Li, Yinan, et al.
Published: (2025)
Nearly Optimal Active Preference Learning and Its Application to LLM Alignment
by: Zhao, Yao, et al.
Published: (2026)
by: Zhao, Yao, et al.
Published: (2026)
Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation
by: de Langis, Karin, et al.
Published: (2024)
by: de Langis, Karin, et al.
Published: (2024)
Do LLMs Recognize Your Latent Preferences? A Benchmark for Latent Information Discovery in Personalized Interaction
by: Tsaknakis, Ioannis, et al.
Published: (2025)
by: Tsaknakis, Ioannis, et al.
Published: (2025)
Minimum Empirical Divergence for Sub-Gaussian Linear Bandits
by: Balagopalan, Kapilan, et al.
Published: (2024)
by: Balagopalan, Kapilan, et al.
Published: (2024)
Attention-Based Reward Shaping for Sparse and Delayed Rewards
by: Holmes, Ian, et al.
Published: (2025)
by: Holmes, Ian, et al.
Published: (2025)
Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion
by: Lee, Junghyun, et al.
Published: (2023)
by: Lee, Junghyun, et al.
Published: (2023)
HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search
by: Nguyen, Tuan Ngo, et al.
Published: (2024)
by: Nguyen, Tuan Ngo, et al.
Published: (2024)
A Bayesian Approach to Robust Inverse Reinforcement Learning
by: Wei, Ran, et al.
Published: (2023)
by: Wei, Ran, et al.
Published: (2023)
Explainable Bayesian Optimization
by: Chakraborty, Tanmay, et al.
Published: (2024)
by: Chakraborty, Tanmay, et al.
Published: (2024)
RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards
by: Zargarbashi, Fatemeh, et al.
Published: (2024)
by: Zargarbashi, Fatemeh, et al.
Published: (2024)
TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
by: Liu, Yuyang, et al.
Published: (2025)
by: Liu, Yuyang, et al.
Published: (2025)
Efficient Low-Rank Matrix Estimation, Experimental Design, and Arm-Set-Dependent Low-Rank Bandits
by: Jang, Kyoungseok, et al.
Published: (2024)
by: Jang, Kyoungseok, et al.
Published: (2024)
$\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree Search
by: Li, Yinan, et al.
Published: (2026)
by: Li, Yinan, et al.
Published: (2026)
Nonparametric Bayesian Optimization for General Rewards
by: Zhang, Zishi, et al.
Published: (2026)
by: Zhang, Zishi, et al.
Published: (2026)
Coverage Improvement and Fast Convergence of On-policy Preference Learning
by: Kim, Juno, et al.
Published: (2026)
by: Kim, Juno, et al.
Published: (2026)
Memory-Efficient LLM Pretraining via Minimalist Optimizer Design
by: Glentis, Athanasios, et al.
Published: (2025)
by: Glentis, Athanasios, et al.
Published: (2025)
Bayesian Reward Models for LLM Alignment
by: Yang, Adam X., et al.
Published: (2024)
by: Yang, Adam X., et al.
Published: (2024)
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
by: Jeong, Soyeong, et al.
Published: (2025)
by: Jeong, Soyeong, et al.
Published: (2025)
From Demonstrations to Rewards: Alignment Without Explicit Human Preferences
by: Zeng, Siliang, et al.
Published: (2025)
by: Zeng, Siliang, et al.
Published: (2025)
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Reward Design
by: Wei, Quan, et al.
Published: (2025)
by: Wei, Quan, et al.
Published: (2025)
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
by: Lee, Junghyun, et al.
Published: (2024)
by: Lee, Junghyun, et al.
Published: (2024)
Bayesian Preference Learning for Test-Time Steerable Reward Models
by: Hong, Jiwoo, et al.
Published: (2026)
by: Hong, Jiwoo, et al.
Published: (2026)
Iterated Energy-based Flow Matching for Sampling from Boltzmann Densities
by: Woo, Dongyeop, et al.
Published: (2024)
by: Woo, Dongyeop, et al.
Published: (2024)
Dense Reward for Free in Reinforcement Learning from Human Feedback
by: Chan, Alex J., et al.
Published: (2024)
by: Chan, Alex J., et al.
Published: (2024)
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling
by: Qin, Hao, et al.
Published: (2025)
by: Qin, Hao, et al.
Published: (2025)
BézierFlow: Learning Bézier Stochastic Interpolant Schedulers for Few-Step Generation
by: Min, Yunhong, et al.
Published: (2025)
by: Min, Yunhong, et al.
Published: (2025)
Guaranteeing Control Requirements via Reward Shaping in Reinforcement Learning
by: De Lellis, Francesco, et al.
Published: (2023)
by: De Lellis, Francesco, et al.
Published: (2023)
Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning
by: Park, Giseung, et al.
Published: (2025)
by: Park, Giseung, et al.
Published: (2025)
Similar Items
-
Benchmarking Cognitive Biases in Large Language Models as Evaluators
by: Koo, Ryan, et al.
Published: (2023) -
The Amazing Agent Race: Strong Tool Users, Weak Navigators
by: Kim, Zae Myung, et al.
Published: (2026) -
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models
by: Kim, Zae Myung, et al.
Published: (2025) -
Scaling Unverifiable Rewards: A Case Study on Visual Insights
by: Gan, Shuyu, et al.
Published: (2025) -
Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization
by: Jun, Kwang-Sung, et al.
Published: (2024)