Guardado en:
| Autores principales: | Piche, Dereck, Muqeeth, Mohammed, Aghajohari, Milad, Duque, Juan, Noukhovitch, Michael, Courville, Aaron |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2511.19405 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
LOQA: Learning with Opponent Q-Learning Awareness
por: Aghajohari, Milad, et al.
Publicado: (2024)
por: Aghajohari, Milad, et al.
Publicado: (2024)
Best Response Shaping
por: Aghajohari, Milad, et al.
Publicado: (2024)
por: Aghajohari, Milad, et al.
Publicado: (2024)
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
por: Lavoie, Samuel, et al.
Publicado: (2025)
por: Lavoie, Samuel, et al.
Publicado: (2025)
Advantage Alignment Algorithms
por: Duque, Juan Agustin, et al.
Publicado: (2024)
por: Duque, Juan Agustin, et al.
Publicado: (2024)
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
por: Noukhovitch, Michael, et al.
Publicado: (2024)
por: Noukhovitch, Michael, et al.
Publicado: (2024)
The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
por: Aghajohari, Milad, et al.
Publicado: (2025)
por: Aghajohari, Milad, et al.
Publicado: (2025)
VinePPO: Refining Credit Assignment in RL Training of LLMs
por: Kazemnejad, Amirhossein, et al.
Publicado: (2024)
por: Kazemnejad, Amirhossein, et al.
Publicado: (2024)
Soft Merging of Experts with Adaptive Routing
por: Muqeeth, Mohammed, et al.
Publicado: (2023)
por: Muqeeth, Mohammed, et al.
Publicado: (2023)
Learning to Route Among Specialized Experts for Zero-Shot Generalization
por: Muqeeth, Mohammed, et al.
Publicado: (2024)
por: Muqeeth, Mohammed, et al.
Publicado: (2024)
Towards Sustainable Investment Policies Informed by Opponent Shaping
por: Duque, Juan Agustin, et al.
Publicado: (2026)
por: Duque, Juan Agustin, et al.
Publicado: (2026)
Learning Multi-Agent Communication with Contrastive Learning
por: Lo, Yat Long, et al.
Publicado: (2023)
por: Lo, Yat Long, et al.
Publicado: (2023)
Versatile Energy-Based Probabilistic Models for High Energy Physics
por: Cheng, Taoli, et al.
Publicado: (2023)
por: Cheng, Taoli, et al.
Publicado: (2023)
Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
por: Ackermann, Johannes, et al.
Publicado: (2026)
por: Ackermann, Johannes, et al.
Publicado: (2026)
World Modelling Improves Language Model Agents
por: Guo, Shangmin, et al.
Publicado: (2025)
por: Guo, Shangmin, et al.
Publicado: (2025)
Neuroplastic Expansion in Deep Reinforcement Learning
por: Liu, Jiashun, et al.
Publicado: (2024)
por: Liu, Jiashun, et al.
Publicado: (2024)
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
por: Yadav, Prateek, et al.
Publicado: (2024)
por: Yadav, Prateek, et al.
Publicado: (2024)
A Mechanistic Analysis of Looped Reasoning Language Models
por: Blayney, Hugh, et al.
Publicado: (2026)
por: Blayney, Hugh, et al.
Publicado: (2026)
The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks
por: Mayor, Walter, et al.
Publicado: (2025)
por: Mayor, Walter, et al.
Publicado: (2025)
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
por: Huang, Shengyi, et al.
Publicado: (2024)
por: Huang, Shengyi, et al.
Publicado: (2024)
The Curse of Diversity in Ensemble-Based Exploration
por: Lin, Zhixuan, et al.
Publicado: (2024)
por: Lin, Zhixuan, et al.
Publicado: (2024)
The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
por: Liu, Jiashun, et al.
Publicado: (2025)
por: Liu, Jiashun, et al.
Publicado: (2025)
In value-based deep reinforcement learning, a pruned network is a good network
por: Obando-Ceron, Johan, et al.
Publicado: (2024)
por: Obando-Ceron, Johan, et al.
Publicado: (2024)
Not All LLM Reasoners Are Created Equal
por: Hosseini, Arian, et al.
Publicado: (2024)
por: Hosseini, Arian, et al.
Publicado: (2024)
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
por: Tang, Hongyao, et al.
Publicado: (2025)
por: Tang, Hongyao, et al.
Publicado: (2025)
Using Large Language Models to Detect Socially Shared Regulation of Collaborative Learning
por: Zhang, Jiayi, et al.
Publicado: (2026)
por: Zhang, Jiayi, et al.
Publicado: (2026)
Scattered Mixture-of-Experts Implementation
por: Tan, Shawn, et al.
Publicado: (2024)
por: Tan, Shawn, et al.
Publicado: (2024)
Active Learning Strategies for Efficient Machine-Learned Interatomic Potentials Across Diverse Material Systems
por: Khan, Mohammed Azeez, et al.
Publicado: (2026)
por: Khan, Mohammed Azeez, et al.
Publicado: (2026)
Bias Analysis in Unconditional Image Generative Models
por: Zhang, Xiaofeng, et al.
Publicado: (2025)
por: Zhang, Xiaofeng, et al.
Publicado: (2025)
Stable Deep Reinforcement Learning via Isotropic Gaussian Representations
por: Pasand, Ali Saheb, et al.
Publicado: (2026)
por: Pasand, Ali Saheb, et al.
Publicado: (2026)
Forgetting Transformer: Softmax Attention with a Forget Gate
por: Lin, Zhixuan, et al.
Publicado: (2025)
por: Lin, Zhixuan, et al.
Publicado: (2025)
PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation
por: Piché, Alexandre, et al.
Publicado: (2025)
por: Piché, Alexandre, et al.
Publicado: (2025)
Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
por: Castanyer, Roger Creus, et al.
Publicado: (2025)
por: Castanyer, Roger Creus, et al.
Publicado: (2025)
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
por: Lavoie, Samuel, et al.
Publicado: (2024)
por: Lavoie, Samuel, et al.
Publicado: (2024)
ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models
por: Defazio, Aaron
Publicado: (2026)
por: Defazio, Aaron
Publicado: (2026)
Evolution Strategies at the Hyperscale
por: Sarkar, Bidipta, et al.
Publicado: (2025)
por: Sarkar, Bidipta, et al.
Publicado: (2025)
Remote Timing Attacks on Efficient Language Model Inference
por: Carlini, Nicholas, et al.
Publicado: (2024)
por: Carlini, Nicholas, et al.
Publicado: (2024)
Adaptive Computation Pruning for the Forgetting Transformer
por: Lin, Zhixuan, et al.
Publicado: (2025)
por: Lin, Zhixuan, et al.
Publicado: (2025)
BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
por: Tsirigotis, Christos, et al.
Publicado: (2025)
por: Tsirigotis, Christos, et al.
Publicado: (2025)
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
por: Tan, Shawn, et al.
Publicado: (2024)
por: Tan, Shawn, et al.
Publicado: (2024)
Exploring validation metrics for offline model-based optimisation with diffusion models
por: Beckham, Christopher, et al.
Publicado: (2022)
por: Beckham, Christopher, et al.
Publicado: (2022)
Ejemplares similares
-
LOQA: Learning with Opponent Q-Learning Awareness
por: Aghajohari, Milad, et al.
Publicado: (2024) -
Best Response Shaping
por: Aghajohari, Milad, et al.
Publicado: (2024) -
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
por: Lavoie, Samuel, et al.
Publicado: (2025) -
Advantage Alignment Algorithms
por: Duque, Juan Agustin, et al.
Publicado: (2024) -
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
por: Noukhovitch, Michael, et al.
Publicado: (2024)