:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Daniels, Oliver, Moodley, Perusha, Marlin, Benjamin M., Lindner, David
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.08877
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploration Hacking: Can LLMs Learn to Resist RL Training?
by: Jang, Eyon, et al.
Published: (2026)

Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces
by: Moodley, Perusha, et al.
Published: (2024)

ACE and Diverse Generalization via Selective Disagreement
by: Daniels, Oliver, et al.
Published: (2025)

Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025)

StepCountJITAI: simulation environment for RL with application to physical activity adaptive intervention
by: Karine, Karine, et al.
Published: (2024)

Heteroscedastic Temporal Variational Autoencoder For Irregular Time Series
by: Shukla, Satya Narayan, et al.
Published: (2021)

Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States
by: Karine, Karine, et al.
Published: (2025)

Strategically Deceptive Model Deployment in Performative Prediction
by: Bautiste, Javier Sanguino, et al.
Published: (2025)

Detecting Strategic Deception Using Linear Probes
by: Goldowsky-Dill, Nicholas, et al.
Published: (2025)

To Start Up a Start-Up$-$Embedding Strategic Demand Development in Operational On-Demand Fulfillment via Reinforcement Learning with Information Shaping
by: Chen, Xinwei, et al.
Published: (2025)

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection
by: Camarato, Steffen J., et al.
Published: (2026)

BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings
by: Karine, Karine, et al.
Published: (2024)

Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
by: Wu, Zhaomin, et al.
Published: (2025)

REBAR: Retrieval-Based Reconstruction for Time-series Contrastive Learning
by: Xu, Maxwell A., et al.
Published: (2023)

Strategic Hypothesis Testing
by: Hossain, Safwan, et al.
Published: (2025)

Differentially Private Auditing Under Strategic Response
by: Burnat, Florian A. D.
Published: (2026)

Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation
by: Wei, Hui, et al.
Published: (2024)

Detecting Proxy Gaming in RL and LLM Alignment via Evaluator Stress Tests
by: Shihab, Ibne Farabi, et al.
Published: (2025)

When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models
by: Wang, Kai, et al.
Published: (2025)

Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol
by: Li, Hongmin
Published: (2026)

An Auditing Test To Detect Behavioral Shift in Language Models
by: Richter, Leo, et al.
Published: (2024)

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
by: Kumar, Sachin
Published: (2026)

Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing
by: Broadwater, Keita
Published: (2026)

Auditing Prompt Caching in Language Model APIs
by: Gu, Chenchen, et al.
Published: (2025)

CacheProbe: Auditing Prompt Cache Isolation in Gateway APIs
by: Fahey, Ryan
Published: (2026)

Deception Detection: From Static Texts to Multimodal Signals
by: Logan, Mandela
Published: (2025)

Contextual Chart Generation for Cyber Deception
by: Nguyen, David D., et al.
Published: (2024)

Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation
by: Burnat, Florian A. D., et al.
Published: (2026)

H-FLTN: A Privacy-Preserving Hierarchical Framework for Electric Vehicle Spatio-Temporal Charge Prediction
by: Marlin, Robert, et al.
Published: (2025)

Deceptive Exploration in Multi-armed Bandits
by: Vurankaya, I. Arda, et al.
Published: (2025)

To Align or Not to Align: Strategic Multimodal Representation Alignment for Optimal Performance
by: Fang, Wanlong, et al.
Published: (2025)

On the Contractivity of Stochastic Interpolation Flow
by: Daniels, Mara
Published: (2025)

A lightweight Spatial-Temporal Graph Neural Network for Long-term Time Series Forecasting
by: Moges, Henok Tenaw, et al.
Published: (2025)

CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
by: Arnav, Benjamin, et al.
Published: (2025)

Instance-Level Data-Use Auditing of Visual ML Models
by: Huang, Zonghao, et al.
Published: (2025)

Stress-Testing Capability Elicitation With Password-Locked Models
by: Greenblatt, Ryan, et al.
Published: (2024)

Reinforcement Learning for High-Level Strategic Control in Tower Defense Games
by: Bergdahl, Joakim, et al.
Published: (2024)

A note on the VC dimension of 1-dimensional GNNs
by: Daniëls, Noah, et al.
Published: (2024)

Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models
by: Basu, Abhinaba, et al.
Published: (2026)

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking
by: Yang, Wenyuan, et al.
Published: (2025)