:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Piche, Dereck, Muqeeth, Mohammed, Aghajohari, Milad, Duque, Juan, Noukhovitch, Michael, Courville, Aaron
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2511.19405
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

LOQA: Learning with Opponent Q-Learning Awareness
por: Aghajohari, Milad, et al.
Publicado: (2024)

Best Response Shaping
por: Aghajohari, Milad, et al.
Publicado: (2024)

Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
por: Lavoie, Samuel, et al.
Publicado: (2025)

Advantage Alignment Algorithms
por: Duque, Juan Agustin, et al.
Publicado: (2024)

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
por: Noukhovitch, Michael, et al.
Publicado: (2024)

The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
por: Aghajohari, Milad, et al.
Publicado: (2025)

VinePPO: Refining Credit Assignment in RL Training of LLMs
por: Kazemnejad, Amirhossein, et al.
Publicado: (2024)

Soft Merging of Experts with Adaptive Routing
por: Muqeeth, Mohammed, et al.
Publicado: (2023)

Learning to Route Among Specialized Experts for Zero-Shot Generalization
por: Muqeeth, Mohammed, et al.
Publicado: (2024)

Towards Sustainable Investment Policies Informed by Opponent Shaping
por: Duque, Juan Agustin, et al.
Publicado: (2026)

Learning Multi-Agent Communication with Contrastive Learning
por: Lo, Yat Long, et al.
Publicado: (2023)

Versatile Energy-Based Probabilistic Models for High Energy Physics
por: Cheng, Taoli, et al.
Publicado: (2023)

Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
por: Ackermann, Johannes, et al.
Publicado: (2026)

World Modelling Improves Language Model Agents
por: Guo, Shangmin, et al.
Publicado: (2025)

Neuroplastic Expansion in Deep Reinforcement Learning
por: Liu, Jiashun, et al.
Publicado: (2024)

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
por: Yadav, Prateek, et al.
Publicado: (2024)

A Mechanistic Analysis of Looped Reasoning Language Models
por: Blayney, Hugh, et al.
Publicado: (2026)

The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks
por: Mayor, Walter, et al.
Publicado: (2025)

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
por: Huang, Shengyi, et al.
Publicado: (2024)

The Curse of Diversity in Ensemble-Based Exploration
por: Lin, Zhixuan, et al.
Publicado: (2024)

The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
por: Liu, Jiashun, et al.
Publicado: (2025)

In value-based deep reinforcement learning, a pruned network is a good network
por: Obando-Ceron, Johan, et al.
Publicado: (2024)

Not All LLM Reasoners Are Created Equal
por: Hosseini, Arian, et al.
Publicado: (2024)

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
por: Tang, Hongyao, et al.
Publicado: (2025)

Using Large Language Models to Detect Socially Shared Regulation of Collaborative Learning
por: Zhang, Jiayi, et al.
Publicado: (2026)

Scattered Mixture-of-Experts Implementation
por: Tan, Shawn, et al.
Publicado: (2024)

Active Learning Strategies for Efficient Machine-Learned Interatomic Potentials Across Diverse Material Systems
por: Khan, Mohammed Azeez, et al.
Publicado: (2026)

Bias Analysis in Unconditional Image Generative Models
por: Zhang, Xiaofeng, et al.
Publicado: (2025)

Stable Deep Reinforcement Learning via Isotropic Gaussian Representations
por: Pasand, Ali Saheb, et al.
Publicado: (2026)

Forgetting Transformer: Softmax Attention with a Forget Gate
por: Lin, Zhixuan, et al.
Publicado: (2025)

PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation
por: Piché, Alexandre, et al.
Publicado: (2025)

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
por: Castanyer, Roger Creus, et al.
Publicado: (2025)

Modeling Caption Diversity in Contrastive Vision-Language Pretraining
por: Lavoie, Samuel, et al.
Publicado: (2024)

ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models
por: Defazio, Aaron
Publicado: (2026)

Evolution Strategies at the Hyperscale
por: Sarkar, Bidipta, et al.
Publicado: (2025)

Remote Timing Attacks on Efficient Language Model Inference
por: Carlini, Nicholas, et al.
Publicado: (2024)

Adaptive Computation Pruning for the Forgetting Transformer
por: Lin, Zhixuan, et al.
Publicado: (2025)

BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation
por: Tsirigotis, Christos, et al.
Publicado: (2025)

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
por: Tan, Shawn, et al.
Publicado: (2024)

Exploring validation metrics for offline model-based optimisation with diffusion models
por: Beckham, Christopher, et al.
Publicado: (2022)