:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Slama, Katarina, Souly, Alexandra, Bansal, Dishank, Davidson, Henry, Summerfield, Christopher, Luettgau, Lennart
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.18971
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Ask don't tell: Reducing sycophancy in large language models
by: Dubois, Magda, et al.
Published: (2026)

HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
by: Luettgau, Lennart, et al.
Published: (2025)

Lessons from a Chimp: AI "Scheming" and the Quest for Ape Language
by: Summerfield, Christopher, et al.
Published: (2025)

One-shot emergency psychiatric triage across 15 frontier AI chatbots
by: Weilnhammer, Veith, et al.
Published: (2026)

TaskMet: Task-Driven Metric Learning for Model Learning
by: Bansal, Dishank, et al.
Published: (2023)

Technological folie à deux: Feedback Loops Between AI Chatbots and Mental Illness
by: Dohnány, Sebastian, et al.
Published: (2025)

Neural steering vectors reveal dose and exposure-dependent impacts of human-AI relationships
by: Kirk, Hannah Rose, et al.
Published: (2025)

Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
by: Lee, Hyunji, et al.
Published: (2025)

Evaluating whether AI models would sabotage AI safety research
by: Kirk, Robert, et al.
Published: (2026)

Conversational AI increases political knowledge as effectively as self-directed internet search
by: Luettgau, Lennart, et al.
Published: (2025)

UK AISI Alignment Evaluation Case-Study
by: Souly, Alexandra, et al.
Published: (2026)

"I understand why I got this grade": Automatic Short Answer Grading with Feedback
by: Aggarwal, Dishank, et al.
Published: (2024)

RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
by: Malik, Sameer, et al.
Published: (2025)

Seven simple steps for log analysis in AI systems
by: Dubois, Magda, et al.
Published: (2026)

Subjective Behaviors and Preferences in LLM: Language of Browsing
by: Sundaresan, Sai, et al.
Published: (2025)

Large Language Models and Algorithm Execution: Application to an Arithmetic Function
by: Slama, Farah Ben, et al.
Published: (2026)

Philosophical Dispositions as Behavioral Constraints for AI-Assisted Code Review: An Empirical Study
by: Bansal, Kaushal
Published: (2026)

What Do LLM Agents Do When Left Alone? Evidence of Spontaneous Meta-Cognitive Patterns
by: Szeider, Stefan
Published: (2025)

Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
by: Bazinska, Julia, et al.
Published: (2025)

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
by: Singhi, Nishad, et al.
Published: (2025)

Scaling Laws for Predicting Downstream Performance in LLMs
by: Chen, Yangyi, et al.
Published: (2024)

Vulnerability-Amplifying Interaction Loops: a systematic failure mode in AI chatbot mental-health interactions
by: Weilnhammer, Veith, et al.
Published: (2026)

Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores
by: Panda, Shevya, et al.
Published: (2026)

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026)

When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents
by: Mehta, Aman
Published: (2026)

MallowsPO: Fine-Tune Your LLM with Preference Dispersions
by: Chen, Haoxian, et al.
Published: (2024)

Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications
by: Subramanian, Vignesh, et al.
Published: (2026)

When Does Predictive Inverse Dynamics Outperform Behavior Cloning?
by: Schäfer, Lukas, et al.
Published: (2026)

AMPO: Active Multi-Preference Optimization for Self-play Preference Selection
by: Gupta, Taneesh, et al.
Published: (2025)

Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders
by: Ghate, Kshitish, et al.
Published: (2025)

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
by: Andriushchenko, Maksym, et al.
Published: (2024)

Online hand gesture recognition using Continual Graph Transformers
by: Slama, Rim, et al.
Published: (2025)

Do LLM Agents Exhibit Social Behavior?
by: Leng, Yan, et al.
Published: (2023)

D-STGCNT: A Dense Spatio-Temporal Graph Conv-GRU Network based on transformer for assessment of patient physical rehabilitation
by: Mourchid, Youssef, et al.
Published: (2023)

Preemptive Detection and Steering of LLM Misalignment via Latent Reachability
by: Karnik, Sathwik, et al.
Published: (2025)

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
by: Ensign, Danielle, et al.
Published: (2025)

Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
by: Bansal, Hritik, et al.
Published: (2023)

Artificial intelligence can persuade people to take political actions
by: Hackenburg, Kobi, et al.
Published: (2026)

Preference Learning Algorithms Do Not Learn Preference Rankings
by: Chen, Angelica, et al.
Published: (2024)

People readily follow personal advice from AI but it does not improve their well-being
by: Luettgau, Lennart, et al.
Published: (2025)