:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chang, Maria, Luss, Ronny, Liu, Miao, Murugesan, Keerthiram, Ramamurthy, Karthikeyan, Bouneffouf, Djallel
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2605.02751
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Sparsity May Be All You Need: Sparse Random Parameter Adaptation
by: Rios, Jesus, et al.
Published: (2025)

Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations
by: Villa, Danielle, et al.
Published: (2025)

Evaluating the Prompt Steerability of Large Language Models
by: Miehling, Erik, et al.
Published: (2024)

Contextual Moral Value Alignment Through Context-Based Aggregation
by: Dognin, Pierre, et al.
Published: (2024)

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
by: Ngong, Ivoline, et al.
Published: (2025)

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models
by: Basavatia, Shreyas, et al.
Published: (2024)

Scopes of Alignment
by: Varshney, Kush R., et al.
Published: (2025)

CELL your Model: Contrastive Explanations for Large Language Models
by: Luss, Ronny, et al.
Published: (2024)

Multi-Level Explanations for Generative Language Models
by: Paes, Lucas Monteiro, et al.
Published: (2024)

Reasoning about concepts with LLMs: Inconsistencies abound
by: Uceda-Sosa, Rosario, et al.
Published: (2024)

Assessing AI Utility: The Random Guesser Test for Sequential Decision-Making Systems
by: Ide, Shun, et al.
Published: (2024)

Survey: Multi-Armed Bandits Meet Large Language Models
by: Bouneffouf, Djallel, et al.
Published: (2025)

Targeted Advertising on Social Networks Using Online Variational Tensor Regression
by: Idé, Tsuyoshi, et al.
Published: (2022)

Programming Refusal with Conditional Activation Steering
by: Lee, Bruce W., et al.
Published: (2024)

Towards Aligning Language Models with Textual Feedback
by: Lloret, Saüc Abadal, et al.
Published: (2024)

COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling
by: Lin, Baihan, et al.
Published: (2024)

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
by: Achintalwar, Swapnaja, et al.
Published: (2024)

Language Models Coupled with Metacognition Can Outperform Reasoning Models
by: Khandelwal, Vedant, et al.
Published: (2025)

The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?
by: Bouneffouf, Djallel, et al.
Published: (2025)

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning
by: Basu, Kinjal, et al.
Published: (2024)

Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models
by: Gunal, Aylin, et al.
Published: (2024)

AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows
by: Ngong, Ivoline C., et al.
Published: (2026)

AI Steerability 360: A Toolkit for Steering Large Language Models
by: Miehling, Erik, et al.
Published: (2026)

The Effectiveness of Approximate Regularized Replay for Efficient Supervised Fine-Tuning of Large Language Models
by: Riemer, Matthew, et al.
Published: (2025)

Exploring the Personality Traits of LLMs through Latent Features Steering
by: Yang, Shu, et al.
Published: (2024)

Ranking Large Language Models without Ground Truth
by: Dhurandhar, Amit, et al.
Published: (2024)

RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
by: Liang, Kaiqu, et al.
Published: (2025)

CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design
by: Neehal, Nafis, et al.
Published: (2024)

OjaKV: Context-Aware Online Low-Rank KV Cache Compression
by: Zhu, Yuxuan, et al.
Published: (2025)

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion
by: Wu, Yuanhong, et al.
Published: (2026)

ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval
by: Yang, David H., et al.
Published: (2026)

When Stability meets Sufficiency: Informative Explanations that do not Overwhelm
by: Luss, Ronny, et al.
Published: (2021)

Emergent Misalignment is Easy, Narrow Misalignment is Hard
by: Soligo, Anna, et al.
Published: (2026)

Steering Towards Fairness: Mitigating Political Bias in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)

Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation
by: Hahm, Dongyoon, et al.
Published: (2025)

Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
by: Mukherjee, Arpan, et al.
Published: (2024)

Leveraging Implicit Sentiments: Enhancing Reliability and Validity in Psychological Trait Evaluation of LLMs
by: Ma, Huanhuan, et al.
Published: (2025)

Understanding and Mitigating Dataset Corruption in LLM Steering
by: Anderson, Cullen, et al.
Published: (2026)

Fusion Steering: Prompt-Specific Activation Control
by: Chang, Waldemar, et al.
Published: (2025)

Position: Theory of Mind Benchmarks are Broken for Large Language Models
by: Riemer, Matthew, et al.
Published: (2024)