Saved in:
| Main Authors: | Ngo, Richard, Chan, Lawrence, Mindermann, Sören |
|---|---|
| Format: | Preprint |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2209.00626 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks
by: Lee, Su Hyeong, et al.
Published: (2025)
by: Lee, Su Hyeong, et al.
Published: (2025)
Agentic Misalignment: How LLMs Could Be Insider Threats
by: Lynch, Aengus, et al.
Published: (2025)
by: Lynch, Aengus, et al.
Published: (2025)
Preference Learning for AI Alignment: a Causal Perspective
by: Kobalczyk, Katarzyna, et al.
Published: (2025)
by: Kobalczyk, Katarzyna, et al.
Published: (2025)
Alignment faking in large language models
by: Greenblatt, Ryan, et al.
Published: (2024)
by: Greenblatt, Ryan, et al.
Published: (2024)
The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework
by: Baldassarre, Gianluca, et al.
Published: (2024)
by: Baldassarre, Gianluca, et al.
Published: (2024)
Open Problems in Machine Unlearning for AI Safety
by: Barez, Fazl, et al.
Published: (2025)
by: Barez, Fazl, et al.
Published: (2025)
How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation
by: Qian, Linglong, et al.
Published: (2024)
by: Qian, Linglong, et al.
Published: (2024)
Deep Learning for Modeling and Dispatching Hybrid Wind Farm Power Generation
by: Lawrence, Zach, et al.
Published: (2025)
by: Lawrence, Zach, et al.
Published: (2025)
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
by: Bengio, Yoshua, et al.
Published: (2025)
by: Bengio, Yoshua, et al.
Published: (2025)
Machine Learning Systems: A Survey from a Data-Oriented Perspective
by: Cabrera, Christian, et al.
Published: (2023)
by: Cabrera, Christian, et al.
Published: (2023)
Open Problems in Differentiable Social Choice: Learning Mechanisms, Decisions, and Alignment
by: An, Zhiyu, et al.
Published: (2026)
by: An, Zhiyu, et al.
Published: (2026)
Impartial Games: A Challenge for Reinforcement Learning
by: Zhou, Bei, et al.
Published: (2022)
by: Zhou, Bei, et al.
Published: (2022)
Is Exploration or Optimization the Problem for Deep Reinforcement Learning?
by: Berseth, Glen
Published: (2025)
by: Berseth, Glen
Published: (2025)
The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
by: Krishna, Satyapriya, et al.
Published: (2022)
by: Krishna, Satyapriya, et al.
Published: (2022)
AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science
by: Gaddipati, Sasi Kiran, et al.
Published: (2025)
by: Gaddipati, Sasi Kiran, et al.
Published: (2025)
Deep Reinforcement Learning for Picker Routing Problem in Warehousing
by: Dunn, George, et al.
Published: (2024)
by: Dunn, George, et al.
Published: (2024)
Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE
by: Ravuri, Aditya, et al.
Published: (2024)
by: Ravuri, Aditya, et al.
Published: (2024)
Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning
by: Mali, Ankur, et al.
Published: (2025)
by: Mali, Ankur, et al.
Published: (2025)
Mathematical Models of Computation in Superposition
by: Hänni, Kaarel, et al.
Published: (2024)
by: Hänni, Kaarel, et al.
Published: (2024)
Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem
by: Strauß, Niklas, et al.
Published: (2024)
by: Strauß, Niklas, et al.
Published: (2024)
Structure in Deep Reinforcement Learning: A Survey and Open Problems
by: Mohan, Aditya, et al.
Published: (2023)
by: Mohan, Aditya, et al.
Published: (2023)
Dissecting Quantization Error: A Concentration-Alignment Perspective
by: Federici, Marco, et al.
Published: (2026)
by: Federici, Marco, et al.
Published: (2026)
Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
by: Kong, Deyang, et al.
Published: (2025)
by: Kong, Deyang, et al.
Published: (2025)
Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment
by: Vamplew, Peter, et al.
Published: (2024)
by: Vamplew, Peter, et al.
Published: (2024)
Modular addition without black-boxes: Compressing explanations of MLPs that compute numerical integration
by: Yip, Chun Hei, et al.
Published: (2024)
by: Yip, Chun Hei, et al.
Published: (2024)
Applying Time Series Deep Learning Models to Forecast the Growth of Perennial Ryegrass in Ireland
by: Onibonoje, Oluwadurotimi, et al.
Published: (2025)
by: Onibonoje, Oluwadurotimi, et al.
Published: (2025)
Diversity Optimization for Travelling Salesman Problem via Deep Reinforcement Learning
by: Li, Qi, et al.
Published: (2025)
by: Li, Qi, et al.
Published: (2025)
Generative Modeling for Robust Deep Reinforcement Learning on the Traveling Salesman Problem
by: Li, Michael, et al.
Published: (2025)
by: Li, Michael, et al.
Published: (2025)
The Ungrounded Alignment Problem
by: Pickett, Marc, et al.
Published: (2024)
by: Pickett, Marc, et al.
Published: (2024)
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
by: Scholten, Yan, et al.
Published: (2024)
by: Scholten, Yan, et al.
Published: (2024)
Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment
by: Zhou, Weichao, et al.
Published: (2024)
by: Zhou, Weichao, et al.
Published: (2024)
Level Up: Defining and Exploiting Transitional Problems for Curriculum Learning
by: Tang, Zhenwei, et al.
Published: (2026)
by: Tang, Zhenwei, et al.
Published: (2026)
Predicting Preschoolers' Externalizing Problems with Mother-Child Interaction Dynamics and Deep Learning
by: Chen, Xi, et al.
Published: (2024)
by: Chen, Xi, et al.
Published: (2024)
Deep Reinforcement Learning for Traveling Purchaser Problems
by: Yuan, Haofeng, et al.
Published: (2024)
by: Yuan, Haofeng, et al.
Published: (2024)
Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective
by: Zheng, Guanhua, et al.
Published: (2025)
by: Zheng, Guanhua, et al.
Published: (2025)
Reward Shaping for Inference-Time Alignment: A Stackelberg Game Perspective
by: Wang, Haichuan, et al.
Published: (2026)
by: Wang, Haichuan, et al.
Published: (2026)
Socially Integrated Navigation: A Social Acting Robot with Deep Reinforcement Learning
by: Flögel, Daniel, et al.
Published: (2024)
by: Flögel, Daniel, et al.
Published: (2024)
Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications
by: Restrepo, David, et al.
Published: (2024)
by: Restrepo, David, et al.
Published: (2024)
AAC: Admissible-by-Architecture Differentiable Landmark Compression for ALT
by: Le, An T., et al.
Published: (2026)
by: Le, An T., et al.
Published: (2026)
Towards a Learning Theory of Representation Alignment
by: Insulla, Francesco, et al.
Published: (2025)
by: Insulla, Francesco, et al.
Published: (2025)
Similar Items
-
Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks
by: Lee, Su Hyeong, et al.
Published: (2025) -
Agentic Misalignment: How LLMs Could Be Insider Threats
by: Lynch, Aengus, et al.
Published: (2025) -
Preference Learning for AI Alignment: a Causal Perspective
by: Kobalczyk, Katarzyna, et al.
Published: (2025) -
Alignment faking in large language models
by: Greenblatt, Ryan, et al.
Published: (2024) -
The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework
by: Baldassarre, Gianluca, et al.
Published: (2024)