Guardado en:
| Autores principales: | Liu, Ning, Sun, Chuanneng, Klinkner, Kristina, Malmasi, Shervin |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.08037 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization
por: Kujanpää, Kalle, et al.
Publicado: (2026)
por: Kujanpää, Kalle, et al.
Publicado: (2026)
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
por: Rafailov, Rafael, et al.
Publicado: (2023)
por: Rafailov, Rafael, et al.
Publicado: (2023)
Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model
por: Kallus, Nathan
Publicado: (2025)
por: Kallus, Nathan
Publicado: (2025)
LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions
por: Sun, Chuanneng, et al.
Publicado: (2024)
por: Sun, Chuanneng, et al.
Publicado: (2024)
Large Language Model is Secretly a Protein Sequence Optimizer
por: Wang, Yinkai, et al.
Publicado: (2025)
por: Wang, Yinkai, et al.
Publicado: (2025)
Your Transformer is Secretly Linear
por: Razzhigaev, Anton, et al.
Publicado: (2024)
por: Razzhigaev, Anton, et al.
Publicado: (2024)
Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment
por: Xu, Wenzhe, et al.
Publicado: (2026)
por: Xu, Wenzhe, et al.
Publicado: (2026)
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
por: Zhou, Zhanhui, et al.
Publicado: (2023)
por: Zhou, Zhanhui, et al.
Publicado: (2023)
Your Learned Constraint is Secretly a Backward Reachable Tube
por: Qadri, Mohamad, et al.
Publicado: (2025)
por: Qadri, Mohamad, et al.
Publicado: (2025)
Your Absorbing Discrete Diffusion Secretly Models the Bayesian Posterior
por: Doyle, Cooper
Publicado: (2025)
por: Doyle, Cooper
Publicado: (2025)
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
por: Luo, Beier, et al.
Publicado: (2025)
por: Luo, Beier, et al.
Publicado: (2025)
Your Dense Retriever is Secretly an Expeditious Reasoner
por: Zhang, Yichi, et al.
Publicado: (2025)
por: Zhang, Yichi, et al.
Publicado: (2025)
Self-Play Preference Optimization for Language Model Alignment
por: Wu, Yue, et al.
Publicado: (2024)
por: Wu, Yue, et al.
Publicado: (2024)
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
por: Nikolić, Kristina, et al.
Publicado: (2025)
por: Nikolić, Kristina, et al.
Publicado: (2025)
Aligning Diffusion Language Models via Unpaired Preference Optimization
por: Jindal, Vaibhav, et al.
Publicado: (2025)
por: Jindal, Vaibhav, et al.
Publicado: (2025)
Soft Preference Optimization: Aligning Language Models to Expert Distributions
por: Sharifnassab, Arsalan, et al.
Publicado: (2024)
por: Sharifnassab, Arsalan, et al.
Publicado: (2024)
DCRM: A Heuristic to Measure Response Pair Quality in Preference Optimization
por: Huang, Chengyu, et al.
Publicado: (2025)
por: Huang, Chengyu, et al.
Publicado: (2025)
Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
por: Chen, Yi-Chung, et al.
Publicado: (2025)
por: Chen, Yi-Chung, et al.
Publicado: (2025)
LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation
por: Zhang, Xuan, et al.
Publicado: (2024)
por: Zhang, Xuan, et al.
Publicado: (2024)
VAGPO: Vision-augmented Asymmetric Group Preference Optimization for Graph Routing Problems
por: Liu, Shiyan, et al.
Publicado: (2025)
por: Liu, Shiyan, et al.
Publicado: (2025)
Towards Automated Machine Learning Research
por: Ardeshir, Shervin
Publicado: (2024)
por: Ardeshir, Shervin
Publicado: (2024)
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples
por: Xie, Shuo, et al.
Publicado: (2024)
por: Xie, Shuo, et al.
Publicado: (2024)
Graph Unlearning Meets Influence-aware Negative Preference Optimization
por: Chen, Qiang, et al.
Publicado: (2025)
por: Chen, Qiang, et al.
Publicado: (2025)
Boost Your Human Image Generation Model via Direct Preference Optimization
por: Na, Sanghyeon, et al.
Publicado: (2024)
por: Na, Sanghyeon, et al.
Publicado: (2024)
Quantifying Representation Reliability in Self-Supervised Learning Models
por: Park, Young-Jin, et al.
Publicado: (2023)
por: Park, Young-Jin, et al.
Publicado: (2023)
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
por: Cheng, Pengyu, et al.
Publicado: (2023)
por: Cheng, Pengyu, et al.
Publicado: (2023)
Teaching Your Models to Understand Code via Focal Preference Alignment
por: Wu, Jie, et al.
Publicado: (2025)
por: Wu, Jie, et al.
Publicado: (2025)
Graph Foundation Models: Bridging Language Model Paradigms and Graph Optimization
por: Liang, Yunhao, et al.
Publicado: (2025)
por: Liang, Yunhao, et al.
Publicado: (2025)
Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models
por: Xiong, Yi, et al.
Publicado: (2026)
por: Xiong, Yi, et al.
Publicado: (2026)
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
por: Zhang, Yifan, et al.
Publicado: (2024)
por: Zhang, Yifan, et al.
Publicado: (2024)
ROPO: Robust Preference Optimization for Large Language Models
por: Liang, Xize, et al.
Publicado: (2024)
por: Liang, Xize, et al.
Publicado: (2024)
Accelerated Preference Optimization for Large Language Model Alignment
por: He, Jiafan, et al.
Publicado: (2024)
por: He, Jiafan, et al.
Publicado: (2024)
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
por: Chen, Haoxian, et al.
Publicado: (2024)
por: Chen, Haoxian, et al.
Publicado: (2024)
EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization
por: Rong, Dingyi, et al.
Publicado: (2025)
por: Rong, Dingyi, et al.
Publicado: (2025)
GRPO is Secretly a Process Reward Model
por: Sullivan, Michael, et al.
Publicado: (2025)
por: Sullivan, Michael, et al.
Publicado: (2025)
Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization
por: Nguyen, Thanh Thi, et al.
Publicado: (2025)
por: Nguyen, Thanh Thi, et al.
Publicado: (2025)
Quantile Regression with Large Language Models for Price Prediction
por: Vedula, Nikhita, et al.
Publicado: (2025)
por: Vedula, Nikhita, et al.
Publicado: (2025)
How to Turn Your Knowledge Graph Embeddings into Generative Models
por: Loconte, Lorenzo, et al.
Publicado: (2023)
por: Loconte, Lorenzo, et al.
Publicado: (2023)
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
por: Chen, Lizhang, et al.
Publicado: (2023)
por: Chen, Lizhang, et al.
Publicado: (2023)
Adversarial Curriculum Graph Contrastive Learning with Pair-wise Augmentation
por: Zhao, Xinjian, et al.
Publicado: (2024)
por: Zhao, Xinjian, et al.
Publicado: (2024)
Ejemplares similares
-
Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization
por: Kujanpää, Kalle, et al.
Publicado: (2026) -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
por: Rafailov, Rafael, et al.
Publicado: (2023) -
Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model
por: Kallus, Nathan
Publicado: (2025) -
LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions
por: Sun, Chuanneng, et al.
Publicado: (2024) -
Large Language Model is Secretly a Protein Sequence Optimizer
por: Wang, Yinkai, et al.
Publicado: (2025)