Saved in:
| Main Authors: | An, Zhiyu, Du, Wan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03003 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Representational Homomorphism Predicts and Improves Compositional Generalization In Transformer Language Model
by: An, Zhiyu, et al.
Published: (2026)
by: An, Zhiyu, et al.
Published: (2026)
Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
by: Hou, Zhibo, et al.
Published: (2025)
by: Hou, Zhibo, et al.
Published: (2025)
Go Beyond Black-box Policies: Rethinking the Design of Learning Agent for Interpretable and Verifiable HVAC Control
by: An, Zhiyu, et al.
Published: (2024)
by: An, Zhiyu, et al.
Published: (2024)
MoralReason: Generalizable Moral Decision Alignment For LLM Agents Using Reasoning-Level Reinforcement Learning
by: An, Zhiyu, et al.
Published: (2025)
by: An, Zhiyu, et al.
Published: (2025)
The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework
by: Baldassarre, Gianluca, et al.
Published: (2024)
by: Baldassarre, Gianluca, et al.
Published: (2024)
DIML: Differentiable Inverse Mechanism Learning from Behaviors of Multi-Agent Learning Trajectories
by: An, Zhiyu, et al.
Published: (2026)
by: An, Zhiyu, et al.
Published: (2026)
The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity
by: Aouad, Ali, et al.
Published: (2025)
by: Aouad, Ali, et al.
Published: (2025)
Representative Social Choice: From Learning Theory to AI Alignment
by: Qiu, Tianyi
Published: (2024)
by: Qiu, Tianyi
Published: (2024)
The Alignment Problem from a Deep Learning Perspective
by: Ngo, Richard, et al.
Published: (2022)
by: Ngo, Richard, et al.
Published: (2022)
Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems
by: Carr, Jonathan Colaço, et al.
Published: (2026)
by: Carr, Jonathan Colaço, et al.
Published: (2026)
Solver-Free Decision-Focused Learning for Linear Optimization Problems
by: Berden, Senne, et al.
Published: (2025)
by: Berden, Senne, et al.
Published: (2025)
Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
by: Tang, Yuxuan, et al.
Published: (2025)
by: Tang, Yuxuan, et al.
Published: (2025)
AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning
by: Wu, Peilin, et al.
Published: (2026)
by: Wu, Peilin, et al.
Published: (2026)
Minimizing Surrogate Losses for Decision-Focused Learning using Differentiable Optimization
by: Mandi, Jayanta, et al.
Published: (2025)
by: Mandi, Jayanta, et al.
Published: (2025)
FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making
by: Wang, Yucen, et al.
Published: (2025)
by: Wang, Yucen, et al.
Published: (2025)
An Unsupervised Learning Framework Combined with Heuristics for the Maximum Minimal Cut Problem
by: Liu, Huaiyuan, et al.
Published: (2024)
by: Liu, Huaiyuan, et al.
Published: (2024)
Adversarial Preference Learning for Robust LLM Alignment
by: Wang, Yuanfu, et al.
Published: (2025)
by: Wang, Yuanfu, et al.
Published: (2025)
Structure in Deep Reinforcement Learning: A Survey and Open Problems
by: Mohan, Aditya, et al.
Published: (2023)
by: Mohan, Aditya, et al.
Published: (2023)
Approximation-Free Differentiable Oblique Decision Trees
by: Panda, Subrat Prasad, et al.
Published: (2026)
by: Panda, Subrat Prasad, et al.
Published: (2026)
Generative Social Choice
by: Fish, Sara, et al.
Published: (2023)
by: Fish, Sara, et al.
Published: (2023)
AllMatch: Exploiting All Unlabeled Data for Semi-Supervised Learning
by: Wu, Zhiyu, et al.
Published: (2024)
by: Wu, Zhiyu, et al.
Published: (2024)
Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?
by: Kalra, Akansha, et al.
Published: (2023)
by: Kalra, Akansha, et al.
Published: (2023)
Demystifying the Physics of Deep Reinforcement Learning-Based Autonomous Vehicle Decision-Making
by: Wan, Hanxi, et al.
Published: (2024)
by: Wan, Hanxi, et al.
Published: (2024)
Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes
by: Blaser, Ethan, et al.
Published: (2026)
by: Blaser, Ethan, et al.
Published: (2026)
Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory
by: Xiao, Jiancong, et al.
Published: (2025)
by: Xiao, Jiancong, et al.
Published: (2025)
Federated Graph Semantic and Structural Learning
by: Huang, Wenke, et al.
Published: (2024)
by: Huang, Wenke, et al.
Published: (2024)
TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment
by: Wang, Jiaxuan, et al.
Published: (2026)
by: Wang, Jiaxuan, et al.
Published: (2026)
Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning
by: Muppidi, Aneesh, et al.
Published: (2024)
by: Muppidi, Aneesh, et al.
Published: (2024)
The Importance of Architecture Choice in Deep Learning for Climate Applications
by: Dräger, Simon, et al.
Published: (2024)
by: Dräger, Simon, et al.
Published: (2024)
GEAR: A General Evaluation Framework for Abductive Reasoning
by: He, Kaiyu, et al.
Published: (2025)
by: He, Kaiyu, et al.
Published: (2025)
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
by: Wu, Peilin, et al.
Published: (2025)
by: Wu, Peilin, et al.
Published: (2025)
The Ungrounded Alignment Problem
by: Pickett, Marc, et al.
Published: (2024)
by: Pickett, Marc, et al.
Published: (2024)
Pattern Recognition or Medical Knowledge? The Problem with Multiple-Choice Questions in Medicine
by: Griot, Maxime, et al.
Published: (2024)
by: Griot, Maxime, et al.
Published: (2024)
Generative Social Choice: The Next Generation
by: Boehmer, Niclas, et al.
Published: (2025)
by: Boehmer, Niclas, et al.
Published: (2025)
DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning
by: Wan, Weikang, et al.
Published: (2024)
by: Wan, Weikang, et al.
Published: (2024)
Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
by: Ayonrinde, Kola
Published: (2024)
by: Ayonrinde, Kola
Published: (2024)
Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning
by: Li, Yichen, et al.
Published: (2025)
by: Li, Yichen, et al.
Published: (2025)
Manifold Approximation leads to Robust Kernel Alignment
by: Islam, Mohammad Tariqul, et al.
Published: (2025)
by: Islam, Mohammad Tariqul, et al.
Published: (2025)
SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas
by: Guo, Zihao, et al.
Published: (2025)
by: Guo, Zihao, et al.
Published: (2025)
Cooperative Open-ended Learning Framework for Zero-shot Coordination
by: Li, Yang, et al.
Published: (2023)
by: Li, Yang, et al.
Published: (2023)
Similar Items
-
Representational Homomorphism Predicts and Improves Compositional Generalization In Transformer Language Model
by: An, Zhiyu, et al.
Published: (2026) -
Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
by: Hou, Zhibo, et al.
Published: (2025) -
Go Beyond Black-box Policies: Rethinking the Design of Learning Agent for Interpretable and Verifiable HVAC Control
by: An, Zhiyu, et al.
Published: (2024) -
MoralReason: Generalizable Moral Decision Alignment For LLM Agents Using Reasoning-Level Reinforcement Learning
by: An, Zhiyu, et al.
Published: (2025) -
The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework
by: Baldassarre, Gianluca, et al.
Published: (2024)