Saved in:
| Main Authors: | Lee, Yoonho, Boen, Joseph, Finn, Chelsea |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.07919 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
by: Xie, Johnathan, et al.
Published: (2024)
by: Xie, Johnathan, et al.
Published: (2024)
Conservative Prediction via Data-Driven Confidence Minimization
by: Choi, Caroline, et al.
Published: (2023)
by: Choi, Caroline, et al.
Published: (2023)
Clarify: Improving Model Robustness With Natural Language Corrections
by: Lee, Yoonho, et al.
Published: (2024)
by: Lee, Yoonho, et al.
Published: (2024)
Calibrating Language Models with Adaptive Temperature Scaling
by: Xie, Johnathan, et al.
Published: (2024)
by: Xie, Johnathan, et al.
Published: (2024)
Test-Time Alignment via Hypothesis Reweighting
by: Lee, Yoonho, et al.
Published: (2024)
by: Lee, Yoonho, et al.
Published: (2024)
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling
by: Liu, Yuejiang, et al.
Published: (2024)
by: Liu, Yuejiang, et al.
Published: (2024)
AutoFT: Learning an Objective for Robust Fine-Tuning
by: Choi, Caroline, et al.
Published: (2024)
by: Choi, Caroline, et al.
Published: (2024)
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
by: Qu, Yuxiao, et al.
Published: (2025)
by: Qu, Yuxiao, et al.
Published: (2025)
TextGrad: Automatic "Differentiation" via Text
by: Yuksekgonul, Mert, et al.
Published: (2024)
by: Yuksekgonul, Mert, et al.
Published: (2024)
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
by: Shi, Lucy Xiaoyang, et al.
Published: (2025)
by: Shi, Lucy Xiaoyang, et al.
Published: (2025)
Continuum-armed Bandit Optimization with Batch Pairwise Comparison Oracles
by: Chang, Xiangyu, et al.
Published: (2025)
by: Chang, Xiangyu, et al.
Published: (2025)
Open-Ended Task Discovery via Bayesian Optimization
by: Adachi, Masaki, et al.
Published: (2026)
by: Adachi, Masaki, et al.
Published: (2026)
Pairwise Comparisons without Stochastic Transitivity: Model, Theory and Applications
by: Lee, Sze Ming, et al.
Published: (2025)
by: Lee, Sze Ming, et al.
Published: (2025)
Disentangling Length from Quality in Direct Preference Optimization
by: Park, Ryan, et al.
Published: (2024)
by: Park, Ryan, et al.
Published: (2024)
Quantum Natural Stochastic Pairwise Coordinate Descent
by: Sohail, Mohammad Aamir, et al.
Published: (2024)
by: Sohail, Mohammad Aamir, et al.
Published: (2024)
Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation
by: Arias, Esteban Garces, et al.
Published: (2024)
by: Arias, Esteban Garces, et al.
Published: (2024)
AutoLibra: Agent Metric Induction from Open-Ended Human Feedback
by: Zhu, Hao, et al.
Published: (2025)
by: Zhu, Hao, et al.
Published: (2025)
Minimum Weighted Feedback Arc Sets for Ranking from Pairwise Comparisons
by: Vahidi, Soroush, et al.
Published: (2024)
by: Vahidi, Soroush, et al.
Published: (2024)
Stability-based Generalization Analysis of Randomized Coordinate Descent for Pairwise Learning
by: Wu, Liang, et al.
Published: (2025)
by: Wu, Liang, et al.
Published: (2025)
Robust Agents in Open-Ended Worlds
by: Samvelyan, Mikayel
Published: (2025)
by: Samvelyan, Mikayel
Published: (2025)
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework
by: Arias, Esteban Garces, et al.
Published: (2024)
by: Arias, Esteban Garces, et al.
Published: (2024)
A Critical Evaluation of AI Feedback for Aligning Large Language Models
by: Sharma, Archit, et al.
Published: (2024)
by: Sharma, Archit, et al.
Published: (2024)
Affordance-Guided Reinforcement Learning via Visual Prompting
by: Lee, Olivia Y., et al.
Published: (2024)
by: Lee, Olivia Y., et al.
Published: (2024)
Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses
by: Baral, Sami, et al.
Published: (2024)
by: Baral, Sami, et al.
Published: (2024)
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
by: Kim, Moo Jin, et al.
Published: (2025)
by: Kim, Moo Jin, et al.
Published: (2025)
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons
by: Zhu, Banghua, et al.
Published: (2023)
by: Zhu, Banghua, et al.
Published: (2023)
Limited Memory Online Gradient Descent for Kernelized Pairwise Learning with Dynamic Averaging
by: AlQuabeh, Hilal, et al.
Published: (2024)
by: AlQuabeh, Hilal, et al.
Published: (2024)
Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability
by: Prasad, Aaditya Vikram, et al.
Published: (2026)
by: Prasad, Aaditya Vikram, et al.
Published: (2026)
Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation
by: Arias, Esteban Garces, et al.
Published: (2024)
by: Arias, Esteban Garces, et al.
Published: (2024)
Learning from Similarity/Dissimilarity and Pairwise Comparison
by: Tate, Tomoya, et al.
Published: (2026)
by: Tate, Tomoya, et al.
Published: (2026)
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale
by: He, Runyuan, et al.
Published: (2026)
by: He, Runyuan, et al.
Published: (2026)
Learning Long-Context Diffusion Policies via Past-Token Prediction
by: Torne, Marcel, et al.
Published: (2025)
by: Torne, Marcel, et al.
Published: (2025)
Reinforcement Learning via Implicit Imitation Guidance
by: Dong, Perry, et al.
Published: (2025)
by: Dong, Perry, et al.
Published: (2025)
RLVF: Learning from Verbal Feedback without Overgeneralization
by: Stephan, Moritz, et al.
Published: (2024)
by: Stephan, Moritz, et al.
Published: (2024)
Universal Neural Functionals
by: Zhou, Allan, et al.
Published: (2024)
by: Zhou, Allan, et al.
Published: (2024)
MemER: Scaling Up Memory for Robot Control via Experience Retrieval
by: Sridhar, Ajay, et al.
Published: (2025)
by: Sridhar, Ajay, et al.
Published: (2025)
Contrastive Preference Learning: Learning from Human Feedback without RL
by: Hejna, Joey, et al.
Published: (2023)
by: Hejna, Joey, et al.
Published: (2023)
Score-Based Density Estimation from Pairwise Comparisons
by: Mikkola, Petrus, et al.
Published: (2025)
by: Mikkola, Petrus, et al.
Published: (2025)
Metric Learning from Limited Pairwise Preference Comparisons
by: Wang, Zhi, et al.
Published: (2024)
by: Wang, Zhi, et al.
Published: (2024)
Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options
by: Lee, Joongkyu, et al.
Published: (2025)
by: Lee, Joongkyu, et al.
Published: (2025)
Similar Items
-
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
by: Xie, Johnathan, et al.
Published: (2024) -
Conservative Prediction via Data-Driven Confidence Minimization
by: Choi, Caroline, et al.
Published: (2023) -
Clarify: Improving Model Robustness With Natural Language Corrections
by: Lee, Yoonho, et al.
Published: (2024) -
Calibrating Language Models with Adaptive Temperature Scaling
by: Xie, Johnathan, et al.
Published: (2024) -
Test-Time Alignment via Hypothesis Reweighting
by: Lee, Yoonho, et al.
Published: (2024)