Saved in:
| Main Authors: | Slama, Katarina, Souly, Alexandra, Bansal, Dishank, Davidson, Henry, Summerfield, Christopher, Luettgau, Lennart |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.18971 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Ask don't tell: Reducing sycophancy in large language models
by: Dubois, Magda, et al.
Published: (2026)
by: Dubois, Magda, et al.
Published: (2026)
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
by: Luettgau, Lennart, et al.
Published: (2025)
by: Luettgau, Lennart, et al.
Published: (2025)
Lessons from a Chimp: AI "Scheming" and the Quest for Ape Language
by: Summerfield, Christopher, et al.
Published: (2025)
by: Summerfield, Christopher, et al.
Published: (2025)
One-shot emergency psychiatric triage across 15 frontier AI chatbots
by: Weilnhammer, Veith, et al.
Published: (2026)
by: Weilnhammer, Veith, et al.
Published: (2026)
TaskMet: Task-Driven Metric Learning for Model Learning
by: Bansal, Dishank, et al.
Published: (2023)
by: Bansal, Dishank, et al.
Published: (2023)
Technological folie à deux: Feedback Loops Between AI Chatbots and Mental Illness
by: Dohnány, Sebastian, et al.
Published: (2025)
by: Dohnány, Sebastian, et al.
Published: (2025)
Neural steering vectors reveal dose and exposure-dependent impacts of human-AI relationships
by: Kirk, Hannah Rose, et al.
Published: (2025)
by: Kirk, Hannah Rose, et al.
Published: (2025)
Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
by: Lee, Hyunji, et al.
Published: (2025)
by: Lee, Hyunji, et al.
Published: (2025)
Evaluating whether AI models would sabotage AI safety research
by: Kirk, Robert, et al.
Published: (2026)
by: Kirk, Robert, et al.
Published: (2026)
Conversational AI increases political knowledge as effectively as self-directed internet search
by: Luettgau, Lennart, et al.
Published: (2025)
by: Luettgau, Lennart, et al.
Published: (2025)
UK AISI Alignment Evaluation Case-Study
by: Souly, Alexandra, et al.
Published: (2026)
by: Souly, Alexandra, et al.
Published: (2026)
"I understand why I got this grade": Automatic Short Answer Grading with Feedback
by: Aggarwal, Dishank, et al.
Published: (2024)
by: Aggarwal, Dishank, et al.
Published: (2024)
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
by: Malik, Sameer, et al.
Published: (2025)
by: Malik, Sameer, et al.
Published: (2025)
Seven simple steps for log analysis in AI systems
by: Dubois, Magda, et al.
Published: (2026)
by: Dubois, Magda, et al.
Published: (2026)
Subjective Behaviors and Preferences in LLM: Language of Browsing
by: Sundaresan, Sai, et al.
Published: (2025)
by: Sundaresan, Sai, et al.
Published: (2025)
Large Language Models and Algorithm Execution: Application to an Arithmetic Function
by: Slama, Farah Ben, et al.
Published: (2026)
by: Slama, Farah Ben, et al.
Published: (2026)
Philosophical Dispositions as Behavioral Constraints for AI-Assisted Code Review: An Empirical Study
by: Bansal, Kaushal
Published: (2026)
by: Bansal, Kaushal
Published: (2026)
What Do LLM Agents Do When Left Alone? Evidence of Spontaneous Meta-Cognitive Patterns
by: Szeider, Stefan
Published: (2025)
by: Szeider, Stefan
Published: (2025)
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
by: Bazinska, Julia, et al.
Published: (2025)
by: Bazinska, Julia, et al.
Published: (2025)
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
by: Singhi, Nishad, et al.
Published: (2025)
by: Singhi, Nishad, et al.
Published: (2025)
Scaling Laws for Predicting Downstream Performance in LLMs
by: Chen, Yangyi, et al.
Published: (2024)
by: Chen, Yangyi, et al.
Published: (2024)
Vulnerability-Amplifying Interaction Loops: a systematic failure mode in AI chatbot mental-health interactions
by: Weilnhammer, Veith, et al.
Published: (2026)
by: Weilnhammer, Veith, et al.
Published: (2026)
Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores
by: Panda, Shevya, et al.
Published: (2026)
by: Panda, Shevya, et al.
Published: (2026)
Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026)
by: Zhang, Yue, et al.
Published: (2026)
When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents
by: Mehta, Aman
Published: (2026)
by: Mehta, Aman
Published: (2026)
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
by: Chen, Haoxian, et al.
Published: (2024)
by: Chen, Haoxian, et al.
Published: (2024)
Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications
by: Subramanian, Vignesh, et al.
Published: (2026)
by: Subramanian, Vignesh, et al.
Published: (2026)
When Does Predictive Inverse Dynamics Outperform Behavior Cloning?
by: Schäfer, Lukas, et al.
Published: (2026)
by: Schäfer, Lukas, et al.
Published: (2026)
AMPO: Active Multi-Preference Optimization for Self-play Preference Selection
by: Gupta, Taneesh, et al.
Published: (2025)
by: Gupta, Taneesh, et al.
Published: (2025)
Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders
by: Ghate, Kshitish, et al.
Published: (2025)
by: Ghate, Kshitish, et al.
Published: (2025)
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
by: Andriushchenko, Maksym, et al.
Published: (2024)
by: Andriushchenko, Maksym, et al.
Published: (2024)
Online hand gesture recognition using Continual Graph Transformers
by: Slama, Rim, et al.
Published: (2025)
by: Slama, Rim, et al.
Published: (2025)
Do LLM Agents Exhibit Social Behavior?
by: Leng, Yan, et al.
Published: (2023)
by: Leng, Yan, et al.
Published: (2023)
D-STGCNT: A Dense Spatio-Temporal Graph Conv-GRU Network based on transformer for assessment of patient physical rehabilitation
by: Mourchid, Youssef, et al.
Published: (2023)
by: Mourchid, Youssef, et al.
Published: (2023)
Preemptive Detection and Steering of LLM Misalignment via Latent Reachability
by: Karnik, Sathwik, et al.
Published: (2025)
by: Karnik, Sathwik, et al.
Published: (2025)
The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
by: Ensign, Danielle, et al.
Published: (2025)
by: Ensign, Danielle, et al.
Published: (2025)
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
by: Bansal, Hritik, et al.
Published: (2023)
by: Bansal, Hritik, et al.
Published: (2023)
Artificial intelligence can persuade people to take political actions
by: Hackenburg, Kobi, et al.
Published: (2026)
by: Hackenburg, Kobi, et al.
Published: (2026)
Preference Learning Algorithms Do Not Learn Preference Rankings
by: Chen, Angelica, et al.
Published: (2024)
by: Chen, Angelica, et al.
Published: (2024)
People readily follow personal advice from AI but it does not improve their well-being
by: Luettgau, Lennart, et al.
Published: (2025)
by: Luettgau, Lennart, et al.
Published: (2025)
Similar Items
-
Ask don't tell: Reducing sycophancy in large language models
by: Dubois, Magda, et al.
Published: (2026) -
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
by: Luettgau, Lennart, et al.
Published: (2025) -
Lessons from a Chimp: AI "Scheming" and the Quest for Ape Language
by: Summerfield, Christopher, et al.
Published: (2025) -
One-shot emergency psychiatric triage across 15 frontier AI chatbots
by: Weilnhammer, Veith, et al.
Published: (2026) -
TaskMet: Task-Driven Metric Learning for Model Learning
by: Bansal, Dishank, et al.
Published: (2023)