:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Mukhopadhyay, Snehasis
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.13791
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI
by: Mukhopadhyay, Snehasis
Published: (2026)

Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems
by: Lermen, Simon, et al.
Published: (2025)

AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents
by: Rassul, Yassin H., et al.
Published: (2026)

LH-Deception: Simulating and Understanding LLM Deceptive Behaviors in Long-Horizon Interactions
by: Xu, Yang, et al.
Published: (2025)

Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning
by: Chen, Kang, et al.
Published: (2024)

What if Deception Cannot be Detected? A Cross-Linguistic Study on the Limits of Deception Detection from Text
by: Velutharambath, Aswathy, et al.
Published: (2025)

OpenDeception: Learning Deception and Trust in Human-AI Interaction via Multi-Agent Simulation
by: Wu, Yichen, et al.
Published: (2025)

Can Factual Statements be Deceptive? The DeFaBel Corpus of Belief-based Deception
by: Velutharambath, Aswathy, et al.
Published: (2024)

Dynamic Emotion and Personality Profiling for Multimodal Deception Detection
by: Zheng, Li, et al.
Published: (2026)

Probing the Limits of the Lie Detector Approach to LLM Deception
by: Berger, Tom-Felix
Published: (2026)

The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence
by: Wan, Herun, et al.
Published: (2026)

DECOR: Auditing LLM Deception via Information Manipulation Theory
by: Cai, Linyue, et al.
Published: (2026)

LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models
by: Olson, Matthew Lyle, et al.
Published: (2026)

Voting-based Multimodal Automatic Deception Detection
by: Touma, Lana, et al.
Published: (2023)

DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
by: Huang, Yao, et al.
Published: (2025)

AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models
by: Mukhopadhyay, Snehasis, et al.
Published: (2025)

An Assessment of Model-On-Model Deception
by: Heitkoetter, Julius, et al.
Published: (2024)

How Entangled is Factuality and Deception in German?
by: Velutharambath, Aswathy, et al.
Published: (2024)

Detecting Deceptive Dark Patterns in E-commerce Platforms
by: Ramteke, Arya, et al.
Published: (2024)

D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models
by: Krishna, Satyapriya, et al.
Published: (2025)

Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real-World Detection Challenges
by: Sun, Yanshen, et al.
Published: (2024)

CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation
by: Sinha, Aarush, et al.
Published: (2026)

Deceptive Patterns of Intelligent and Interactive Writing Assistants
by: Benharrak, Karim, et al.
Published: (2024)

Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities of LLMs in Multimodal Settings
by: Miah, Md Messal Monem, et al.
Published: (2025)

Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
by: Wongkamjan, Wichayaporn, et al.
Published: (2025)

SEPSIS: I Can Catch Your Lies -- A New Paradigm for Deception Detection
by: Rani, Anku, et al.
Published: (2023)

Effects of Soft-Domain Transfer and Named Entity Information on Deception Detection
by: Triplett, Steven, et al.
Published: (2024)

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
by: Kumar, Sachin
Published: (2026)

To Tell The Truth: Language of Deception and Language Models
by: Hazra, Sanchaita, et al.
Published: (2023)

Lying to Win: Assessing LLM Deception through Human-AI Games and Parallel-World Probing
by: Marioriyad, Arash, et al.
Published: (2026)

MAiDE-up: Multilingual Deception Detection of GPT-generated Hotel Reviews
by: Ignat, Oana, et al.
Published: (2024)

From Deception to Detection: The Dual Roles of Large Language Models in Fake News
by: Sallami, Dorsaf, et al.
Published: (2024)

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence
by: Bu, Yuyan, et al.
Published: (2026)

Domain-Independent Deception: A New Taxonomy and Linguistic Analysis
by: Verma, Rakesh M., et al.
Published: (2024)

Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing
by: Xie, Jiakuan, et al.
Published: (2025)

Semantic Deception: When Reasoning Models Can't Compute an Addition
by: de Leeuw, Nathaniël, et al.
Published: (2025)

Do Large Language Models Exhibit Spontaneous Rational Deception?
by: Taylor, Samuel M., et al.
Published: (2025)

Deception Abilities Emerged in Large Language Models
by: Hagendorff, Thilo
Published: (2023)

PU-Lie: Lightweight Deception Detection in Imbalanced Diplomatic Dialogues via Positive-Unlabeled Learning
by: Kuwar, Bhavinkumar Vinodbhai, et al.
Published: (2025)

Dharma, Data and Deception: An LLM-Powered Rhetorical Analysis of Cow-Urine Health Claims on YouTube
by: Munir, Sheza, et al.
Published: (2026)