Saved in:
| Main Authors: | Lucassen, James, Henry, Mark, Wright, Philippa, Yeung, Owen |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.15116 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Retrying vs Resampling in AI Control
by: Lucassen, James, et al.
Published: (2026)
by: Lucassen, James, et al.
Published: (2026)
Unreflected Acceptance -- Investigating the Negative Consequences of ChatGPT-Assisted Problem Solving in Physics Education
by: Krupp, Lars, et al.
Published: (2023)
by: Krupp, Lars, et al.
Published: (2023)
BashArena: A Control Setting for Highly Privileged AI Agents
by: Kaufman, Adam, et al.
Published: (2025)
by: Kaufman, Adam, et al.
Published: (2025)
Cluster-norm for Unsupervised Probing of Knowledge
by: Laurito, Walter, et al.
Published: (2024)
by: Laurito, Walter, et al.
Published: (2024)
Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt
by: de Mijolla, Damien, et al.
Published: (2024)
by: de Mijolla, Damien, et al.
Published: (2024)
Evaluating LLMs with Multiple Problems at once
by: Wang, Zhengxiang, et al.
Published: (2024)
by: Wang, Zhengxiang, et al.
Published: (2024)
Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof
by: Weatherall, James Owen, et al.
Published: (2026)
by: Weatherall, James Owen, et al.
Published: (2026)
CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research
by: Queen, Owen, et al.
Published: (2025)
by: Queen, Owen, et al.
Published: (2025)
Beyond Static Snapshots: A Grounded Evaluation Framework for Language Models at the Agentic Frontier
by: Henry, Jazmia
Published: (2026)
by: Henry, Jazmia
Published: (2026)
Alternative Fairness and Accuracy Optimization in Criminal Justice
by: Wu, Shaolong, et al.
Published: (2025)
by: Wu, Shaolong, et al.
Published: (2025)
Interactive AI Alignment: Specification, Process, and Evaluation Alignment
by: Terry, Michael, et al.
Published: (2023)
by: Terry, Michael, et al.
Published: (2023)
DSGym: A Holistic Framework for Evaluating and Training Data Science Agents
by: Nie, Fan, et al.
Published: (2026)
by: Nie, Fan, et al.
Published: (2026)
ReasonOps: Operator Segmentation for LLM Reasoning Traces
by: Lee, Daniel, et al.
Published: (2026)
by: Lee, Daniel, et al.
Published: (2026)
Evaluating Cognitive Age Alignment in Interactive AI Agents
by: Shen, Yifan, et al.
Published: (2026)
by: Shen, Yifan, et al.
Published: (2026)
Pressure Reveals Character: Behavioural Alignment Evaluation at Depth
by: Petrova, Nora, et al.
Published: (2026)
by: Petrova, Nora, et al.
Published: (2026)
Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models
by: Wu, Hui, et al.
Published: (2026)
by: Wu, Hui, et al.
Published: (2026)
An Evaluation of Cultural Value Alignment in LLM
by: Sukiennik, Nicholas, et al.
Published: (2025)
by: Sukiennik, Nicholas, et al.
Published: (2025)
Pluralistic Off-policy Evaluation and Alignment
by: Huang, Chengkai, et al.
Published: (2025)
by: Huang, Chengkai, et al.
Published: (2025)
Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs
by: Wright, Jesse
Published: (2024)
by: Wright, Jesse
Published: (2024)
The Crutch or the Ceiling? How Different Generations of LLMs Shape EFL Student Writings
by: Susanto, Hengky, et al.
Published: (2026)
by: Susanto, Hengky, et al.
Published: (2026)
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
by: Ye, Seonghyeon, et al.
Published: (2023)
by: Ye, Seonghyeon, et al.
Published: (2023)
Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants
by: Galatolo, Alessio, et al.
Published: (2025)
by: Galatolo, Alessio, et al.
Published: (2025)
CORE -- A Cell-Level Coarse-to-Fine Image Registration Engine for Multi-stain Image Alignment
by: Nasir, Esha Sadia, et al.
Published: (2025)
by: Nasir, Esha Sadia, et al.
Published: (2025)
AI Alignment via Incentives and Correction
by: Agarwal, Rohit, et al.
Published: (2026)
by: Agarwal, Rohit, et al.
Published: (2026)
The Goofus & Gallant Story Corpus for Practical Value Alignment
by: Nahian, Md Sultan Al, et al.
Published: (2025)
by: Nahian, Md Sultan Al, et al.
Published: (2025)
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents
by: Luo, Mingyu, et al.
Published: (2026)
by: Luo, Mingyu, et al.
Published: (2026)
On Evaluating LLM Alignment by Evaluating LLMs as Judges
by: Liu, Yixin, et al.
Published: (2025)
by: Liu, Yixin, et al.
Published: (2025)
Evaluating LLM Alignment With Human Trust Models
by: Debnath, Anushka, et al.
Published: (2026)
by: Debnath, Anushka, et al.
Published: (2026)
UK AISI Alignment Evaluation Case-Study
by: Souly, Alexandra, et al.
Published: (2026)
by: Souly, Alexandra, et al.
Published: (2026)
How predictable is language model benchmark performance?
by: Owen, David
Published: (2024)
by: Owen, David
Published: (2024)
Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment
by: Salgado, Henry, et al.
Published: (2025)
by: Salgado, Henry, et al.
Published: (2025)
Exploring the use of AI authors and reviewers at Agents4Science
by: Bianchi, Federico, et al.
Published: (2025)
by: Bianchi, Federico, et al.
Published: (2025)
Evaluating Risks in Weak-to-Strong Alignment: A Bias-Variance Perspective
by: Osooli, Hamid, et al.
Published: (2026)
by: Osooli, Hamid, et al.
Published: (2026)
GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control
by: Xu, Haofeng, et al.
Published: (2026)
by: Xu, Haofeng, et al.
Published: (2026)
The Reward Model Selection Crisis in Personalized Alignment
by: Rezk, Fady, et al.
Published: (2025)
by: Rezk, Fady, et al.
Published: (2025)
From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming
by: Sinha, Anusha, et al.
Published: (2025)
by: Sinha, Anusha, et al.
Published: (2025)
GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment via Neighborhood Partitioning and Generative Subgraph Encoding
by: Dernbach, Stefan, et al.
Published: (2024)
by: Dernbach, Stefan, et al.
Published: (2024)
Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence
by: Rau, Anita, et al.
Published: (2025)
by: Rau, Anita, et al.
Published: (2025)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
by: Fayyaz, Mohsen, et al.
Published: (2024)
by: Fayyaz, Mohsen, et al.
Published: (2024)
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
by: Thakur, Aman Singh, et al.
Published: (2024)
by: Thakur, Aman Singh, et al.
Published: (2024)
Similar Items
-
Retrying vs Resampling in AI Control
by: Lucassen, James, et al.
Published: (2026) -
Unreflected Acceptance -- Investigating the Negative Consequences of ChatGPT-Assisted Problem Solving in Physics Education
by: Krupp, Lars, et al.
Published: (2023) -
BashArena: A Control Setting for Highly Privileged AI Agents
by: Kaufman, Adam, et al.
Published: (2025) -
Cluster-norm for Unsupervised Probing of Knowledge
by: Laurito, Walter, et al.
Published: (2024) -
Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt
by: de Mijolla, Damien, et al.
Published: (2024)