Saved in:
| Main Authors: | Daniels, Oliver, Moodley, Perusha, Marlin, Benjamin M., Lindner, David |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.08877 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploration Hacking: Can LLMs Learn to Resist RL Training?
by: Jang, Eyon, et al.
Published: (2026)
by: Jang, Eyon, et al.
Published: (2026)
Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces
by: Moodley, Perusha, et al.
Published: (2024)
by: Moodley, Perusha, et al.
Published: (2024)
ACE and Diverse Generalization via Selective Disagreement
by: Daniels, Oliver, et al.
Published: (2025)
by: Daniels, Oliver, et al.
Published: (2025)
Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025)
by: Karine, Karine, et al.
Published: (2025)
StepCountJITAI: simulation environment for RL with application to physical activity adaptive intervention
by: Karine, Karine, et al.
Published: (2024)
by: Karine, Karine, et al.
Published: (2024)
Heteroscedastic Temporal Variational Autoencoder For Irregular Time Series
by: Shukla, Satya Narayan, et al.
Published: (2021)
by: Shukla, Satya Narayan, et al.
Published: (2021)
Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States
by: Karine, Karine, et al.
Published: (2025)
by: Karine, Karine, et al.
Published: (2025)
Strategically Deceptive Model Deployment in Performative Prediction
by: Bautiste, Javier Sanguino, et al.
Published: (2025)
by: Bautiste, Javier Sanguino, et al.
Published: (2025)
Detecting Strategic Deception Using Linear Probes
by: Goldowsky-Dill, Nicholas, et al.
Published: (2025)
by: Goldowsky-Dill, Nicholas, et al.
Published: (2025)
To Start Up a Start-Up$-$Embedding Strategic Demand Development in Operational On-Demand Fulfillment via Reinforcement Learning with Information Shaping
by: Chen, Xinwei, et al.
Published: (2025)
by: Chen, Xinwei, et al.
Published: (2025)
PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection
by: Camarato, Steffen J., et al.
Published: (2026)
by: Camarato, Steffen J., et al.
Published: (2026)
BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings
by: Karine, Karine, et al.
Published: (2024)
by: Karine, Karine, et al.
Published: (2024)
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
by: Wu, Zhaomin, et al.
Published: (2025)
by: Wu, Zhaomin, et al.
Published: (2025)
REBAR: Retrieval-Based Reconstruction for Time-series Contrastive Learning
by: Xu, Maxwell A., et al.
Published: (2023)
by: Xu, Maxwell A., et al.
Published: (2023)
Strategic Hypothesis Testing
by: Hossain, Safwan, et al.
Published: (2025)
by: Hossain, Safwan, et al.
Published: (2025)
Differentially Private Auditing Under Strategic Response
by: Burnat, Florian A. D.
Published: (2026)
by: Burnat, Florian A. D.
Published: (2026)
Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation
by: Wei, Hui, et al.
Published: (2024)
by: Wei, Hui, et al.
Published: (2024)
Detecting Proxy Gaming in RL and LLM Alignment via Evaluator Stress Tests
by: Shihab, Ibne Farabi, et al.
Published: (2025)
by: Shihab, Ibne Farabi, et al.
Published: (2025)
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models
by: Wang, Kai, et al.
Published: (2025)
by: Wang, Kai, et al.
Published: (2025)
Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol
by: Li, Hongmin
Published: (2026)
by: Li, Hongmin
Published: (2026)
An Auditing Test To Detect Behavioral Shift in Language Models
by: Richter, Leo, et al.
Published: (2024)
by: Richter, Leo, et al.
Published: (2024)
Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
by: Kumar, Sachin
Published: (2026)
by: Kumar, Sachin
Published: (2026)
Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing
by: Broadwater, Keita
Published: (2026)
by: Broadwater, Keita
Published: (2026)
Auditing Prompt Caching in Language Model APIs
by: Gu, Chenchen, et al.
Published: (2025)
by: Gu, Chenchen, et al.
Published: (2025)
CacheProbe: Auditing Prompt Cache Isolation in Gateway APIs
by: Fahey, Ryan
Published: (2026)
by: Fahey, Ryan
Published: (2026)
Deception Detection: From Static Texts to Multimodal Signals
by: Logan, Mandela
Published: (2025)
by: Logan, Mandela
Published: (2025)
Contextual Chart Generation for Cyber Deception
by: Nguyen, David D., et al.
Published: (2024)
by: Nguyen, David D., et al.
Published: (2024)
Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation
by: Burnat, Florian A. D., et al.
Published: (2026)
by: Burnat, Florian A. D., et al.
Published: (2026)
H-FLTN: A Privacy-Preserving Hierarchical Framework for Electric Vehicle Spatio-Temporal Charge Prediction
by: Marlin, Robert, et al.
Published: (2025)
by: Marlin, Robert, et al.
Published: (2025)
Deceptive Exploration in Multi-armed Bandits
by: Vurankaya, I. Arda, et al.
Published: (2025)
by: Vurankaya, I. Arda, et al.
Published: (2025)
To Align or Not to Align: Strategic Multimodal Representation Alignment for Optimal Performance
by: Fang, Wanlong, et al.
Published: (2025)
by: Fang, Wanlong, et al.
Published: (2025)
On the Contractivity of Stochastic Interpolation Flow
by: Daniels, Mara
Published: (2025)
by: Daniels, Mara
Published: (2025)
A lightweight Spatial-Temporal Graph Neural Network for Long-term Time Series Forecasting
by: Moges, Henok Tenaw, et al.
Published: (2025)
by: Moges, Henok Tenaw, et al.
Published: (2025)
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
by: Arnav, Benjamin, et al.
Published: (2025)
by: Arnav, Benjamin, et al.
Published: (2025)
Instance-Level Data-Use Auditing of Visual ML Models
by: Huang, Zonghao, et al.
Published: (2025)
by: Huang, Zonghao, et al.
Published: (2025)
Stress-Testing Capability Elicitation With Password-Locked Models
by: Greenblatt, Ryan, et al.
Published: (2024)
by: Greenblatt, Ryan, et al.
Published: (2024)
Reinforcement Learning for High-Level Strategic Control in Tower Defense Games
by: Bergdahl, Joakim, et al.
Published: (2024)
by: Bergdahl, Joakim, et al.
Published: (2024)
A note on the VC dimension of 1-dimensional GNNs
by: Daniëls, Noah, et al.
Published: (2024)
by: Daniëls, Noah, et al.
Published: (2024)
Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models
by: Basu, Abhinaba, et al.
Published: (2026)
by: Basu, Abhinaba, et al.
Published: (2026)
SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking
by: Yang, Wenyuan, et al.
Published: (2025)
by: Yang, Wenyuan, et al.
Published: (2025)
Similar Items
-
Exploration Hacking: Can LLMs Learn to Resist RL Training?
by: Jang, Eyon, et al.
Published: (2026) -
Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces
by: Moodley, Perusha, et al.
Published: (2024) -
ACE and Diverse Generalization via Selective Disagreement
by: Daniels, Oliver, et al.
Published: (2025) -
Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025) -
StepCountJITAI: simulation environment for RL with application to physical activity adaptive intervention
by: Karine, Karine, et al.
Published: (2024)