:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ngo, Richard, Chan, Lawrence, Mindermann, Sören
Format:	Preprint
Published:	2022
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2209.00626
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks
by: Lee, Su Hyeong, et al.
Published: (2025)

Agentic Misalignment: How LLMs Could Be Insider Threats
by: Lynch, Aengus, et al.
Published: (2025)

Preference Learning for AI Alignment: a Causal Perspective
by: Kobalczyk, Katarzyna, et al.
Published: (2025)

Alignment faking in large language models
by: Greenblatt, Ryan, et al.
Published: (2024)

The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework
by: Baldassarre, Gianluca, et al.
Published: (2024)

Open Problems in Machine Unlearning for AI Safety
by: Barez, Fazl, et al.
Published: (2025)

How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation
by: Qian, Linglong, et al.
Published: (2024)

Deep Learning for Modeling and Dispatching Hybrid Wind Farm Power Generation
by: Lawrence, Zach, et al.
Published: (2025)

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
by: Bengio, Yoshua, et al.
Published: (2025)

Machine Learning Systems: A Survey from a Data-Oriented Perspective
by: Cabrera, Christian, et al.
Published: (2023)

Open Problems in Differentiable Social Choice: Learning Mechanisms, Decisions, and Alignment
by: An, Zhiyu, et al.
Published: (2026)

Impartial Games: A Challenge for Reinforcement Learning
by: Zhou, Bei, et al.
Published: (2022)

Is Exploration or Optimization the Problem for Deep Reinforcement Learning?
by: Berseth, Glen
Published: (2025)

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
by: Krishna, Satyapriya, et al.
Published: (2022)

AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science
by: Gaddipati, Sasi Kiran, et al.
Published: (2025)

Deep Reinforcement Learning for Picker Routing Problem in Warehousing
by: Dunn, George, et al.
Published: (2024)

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE
by: Ravuri, Aditya, et al.
Published: (2024)

Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning
by: Mali, Ankur, et al.
Published: (2025)

Mathematical Models of Computation in Superposition
by: Hänni, Kaarel, et al.
Published: (2024)

Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem
by: Strauß, Niklas, et al.
Published: (2024)

Structure in Deep Reinforcement Learning: A Survey and Open Problems
by: Mohan, Aditya, et al.
Published: (2023)

Dissecting Quantization Error: A Concentration-Alignment Perspective
by: Federici, Marco, et al.
Published: (2026)

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
by: Kong, Deyang, et al.
Published: (2025)

Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment
by: Vamplew, Peter, et al.
Published: (2024)

Modular addition without black-boxes: Compressing explanations of MLPs that compute numerical integration
by: Yip, Chun Hei, et al.
Published: (2024)

Applying Time Series Deep Learning Models to Forecast the Growth of Perennial Ryegrass in Ireland
by: Onibonoje, Oluwadurotimi, et al.
Published: (2025)

Diversity Optimization for Travelling Salesman Problem via Deep Reinforcement Learning
by: Li, Qi, et al.
Published: (2025)

Generative Modeling for Robust Deep Reinforcement Learning on the Traveling Salesman Problem
by: Li, Michael, et al.
Published: (2025)

The Ungrounded Alignment Problem
by: Pickett, Marc, et al.
Published: (2024)

A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
by: Scholten, Yan, et al.
Published: (2024)

Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment
by: Zhou, Weichao, et al.
Published: (2024)

Level Up: Defining and Exploiting Transitional Problems for Curriculum Learning
by: Tang, Zhenwei, et al.
Published: (2026)

Predicting Preschoolers' Externalizing Problems with Mother-Child Interaction Dynamics and Deep Learning
by: Chen, Xi, et al.
Published: (2024)

Deep Reinforcement Learning for Traveling Purchaser Problems
by: Yuan, Haofeng, et al.
Published: (2024)

Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective
by: Zheng, Guanhua, et al.
Published: (2025)

Reward Shaping for Inference-Time Alignment: A Stackelberg Game Perspective
by: Wang, Haichuan, et al.
Published: (2026)

Socially Integrated Navigation: A Social Acting Robot with Deep Reinforcement Learning
by: Flögel, Daniel, et al.
Published: (2024)

Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications
by: Restrepo, David, et al.
Published: (2024)

AAC: Admissible-by-Architecture Differentiable Landmark Compression for ALT
by: Le, An T., et al.
Published: (2026)

Towards a Learning Theory of Representation Alignment
by: Insulla, Francesco, et al.
Published: (2025)