Saved in:
| Main Authors: | Zhang, Haoting, Chen, Haoxian, Zhan, Donglin, Zhao, Hanyang, Lam, Henry, Tang, Wenpin, Yao, David, Zheng, Zeyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.00685 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
by: Chen, Haoxian, et al.
Published: (2024)
by: Chen, Haoxian, et al.
Published: (2024)
Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning
by: Zhao, Hanyang, et al.
Published: (2024)
by: Zhao, Hanyang, et al.
Published: (2024)
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
by: Zhao, Hanyang, et al.
Published: (2025)
by: Zhao, Hanyang, et al.
Published: (2025)
Understanding Sampler Stochasticity in Training Diffusion Models for RLHF
by: Sheng, Jiayuan, et al.
Published: (2025)
by: Sheng, Jiayuan, et al.
Published: (2025)
OPD+: Rethinking the Advantage Design for On-Policy Distillation
by: Zhao, Hanyang, et al.
Published: (2026)
by: Zhao, Hanyang, et al.
Published: (2026)
Collaborative Bayesian Optimization via Wasserstein Barycenters
by: Zhan, Donglin, et al.
Published: (2025)
by: Zhan, Donglin, et al.
Published: (2025)
Contractive Diffusion Probabilistic Models
by: Tang, Wenpin, et al.
Published: (2024)
by: Tang, Wenpin, et al.
Published: (2024)
Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial
by: Tang, Wenpin, et al.
Published: (2024)
by: Tang, Wenpin, et al.
Published: (2024)
Pseudo-Bayesian Optimization
by: Chen, Haoxian, et al.
Published: (2023)
by: Chen, Haoxian, et al.
Published: (2023)
RPO: Fine-Tuning Visual Generative Models via Rich Vision-Language Preferences
by: Zhao, Hanyang, et al.
Published: (2025)
by: Zhao, Hanyang, et al.
Published: (2025)
DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
by: Zhao, Hanyang, et al.
Published: (2025)
by: Zhao, Hanyang, et al.
Published: (2025)
Daily Physical Activity Monitoring -- Adaptive Learning from Multi-source Motion Sensor Data
by: Zhang, Haoting, et al.
Published: (2024)
by: Zhang, Haoting, et al.
Published: (2024)
Language Model Prompt Selection via Simulation Optimization
by: Zhang, Haoting, et al.
Published: (2024)
by: Zhang, Haoting, et al.
Published: (2024)
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey
by: Winata, Genta Indra, et al.
Published: (2024)
by: Winata, Genta Indra, et al.
Published: (2024)
Diffusion Generative Models Meet Compressed Sensing, with Applications to Imaging and Finance
by: Guo, Zhengyi, et al.
Published: (2025)
by: Guo, Zhengyi, et al.
Published: (2025)
Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond
by: Tang, Wenpin, et al.
Published: (2024)
by: Tang, Wenpin, et al.
Published: (2024)
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
by: Zhao, Hanyang, et al.
Published: (2024)
by: Zhao, Hanyang, et al.
Published: (2024)
Generative Replica-Exchange: A Flow-based Framework for Accelerating Replica Exchange Simulations
by: Huang, Shengjie, et al.
Published: (2026)
by: Huang, Shengjie, et al.
Published: (2026)
Regret of exploratory policy improvement and $q$-learning
by: Tang, Wenpin, et al.
Published: (2024)
by: Tang, Wenpin, et al.
Published: (2024)
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory
by: Tang, Luoxi, et al.
Published: (2026)
by: Tang, Luoxi, et al.
Published: (2026)
Data-Efficient and Robust Task Selection for Meta-Learning
by: Zhan, Donglin, et al.
Published: (2024)
by: Zhan, Donglin, et al.
Published: (2024)
Spatial Conformal Inference through Localized Quantile Regression
by: Jiang, Hanyang, et al.
Published: (2024)
by: Jiang, Hanyang, et al.
Published: (2024)
LLM-Assisted Logic Rule Learning: Scaling Human Expertise for Time Series Anomaly Detection
by: Zhang, Haoting, et al.
Published: (2026)
by: Zhang, Haoting, et al.
Published: (2026)
Polynomial Voting Rules
by: Tang, Wenpin, et al.
Published: (2022)
by: Tang, Wenpin, et al.
Published: (2022)
Conformal prediction for multi-dimensional time series by ellipsoidal sets
by: Xu, Chen, et al.
Published: (2024)
by: Xu, Chen, et al.
Published: (2024)
ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule
by: Huang, Yilie, et al.
Published: (2026)
by: Huang, Yilie, et al.
Published: (2026)
Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian
by: Tang, Wenpin, et al.
Published: (2026)
by: Tang, Wenpin, et al.
Published: (2026)
Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective
by: Elmachtoub, Adam N., et al.
Published: (2023)
by: Elmachtoub, Adam N., et al.
Published: (2023)
Selecting the Best Optimizing System
by: Si, Nian, et al.
Published: (2022)
by: Si, Nian, et al.
Published: (2022)
Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization
by: Xu, Zheng, et al.
Published: (2024)
by: Xu, Zheng, et al.
Published: (2024)
Bilevel Optimization of Agent Skills via Monte Carlo Tree Search
by: Huang, Chenyi, et al.
Published: (2026)
by: Huang, Chenyi, et al.
Published: (2026)
Variational Trajectory Optimization of Anisotropic Diffusion Schedules
by: Liu, Pengxi, et al.
Published: (2026)
by: Liu, Pengxi, et al.
Published: (2026)
Analysis of the Order Flow Auction under Proposer-Builder Separation on Blockchain
by: Ma, Ruofei, et al.
Published: (2025)
by: Ma, Ruofei, et al.
Published: (2025)
Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate
by: Li, Fengpei, et al.
Published: (2024)
by: Li, Fengpei, et al.
Published: (2024)
From Invariant Representations to Invariant Data: Provable Robustness to Spurious Correlations via Noisy Counterfactual Matching
by: Bai, Ruqi, et al.
Published: (2025)
by: Bai, Ruqi, et al.
Published: (2025)
Optimizer's Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization
by: Iyengar, Garud, et al.
Published: (2023)
by: Iyengar, Garud, et al.
Published: (2023)
Spatio-Temporal Conformal Prediction for Power Outage Data
by: Jiang, Hanyang, et al.
Published: (2024)
by: Jiang, Hanyang, et al.
Published: (2024)
LAS PASIONES DE SÓCRATES
by: David Morales Troncoso
Published: (2011)
by: David Morales Troncoso
Published: (2011)
Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation
by: Chen, Ke, et al.
Published: (2024)
by: Chen, Ke, et al.
Published: (2024)
Coreset-Based Task Selection for Sample-Efficient Meta-Reinforcement Learning
by: Zhan, Donglin, et al.
Published: (2025)
by: Zhan, Donglin, et al.
Published: (2025)
Similar Items
-
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
by: Chen, Haoxian, et al.
Published: (2024) -
Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning
by: Zhao, Hanyang, et al.
Published: (2024) -
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
by: Zhao, Hanyang, et al.
Published: (2025) -
Understanding Sampler Stochasticity in Training Diffusion Models for RLHF
by: Sheng, Jiayuan, et al.
Published: (2025) -
OPD+: Rethinking the Advantage Design for On-Policy Distillation
by: Zhao, Hanyang, et al.
Published: (2026)