Saved in:
| Main Author: | Mukhopadhyay, Snehasis |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.13791 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI
by: Mukhopadhyay, Snehasis
Published: (2026)
by: Mukhopadhyay, Snehasis
Published: (2026)
Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems
by: Lermen, Simon, et al.
Published: (2025)
by: Lermen, Simon, et al.
Published: (2025)
AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents
by: Rassul, Yassin H., et al.
Published: (2026)
by: Rassul, Yassin H., et al.
Published: (2026)
LH-Deception: Simulating and Understanding LLM Deceptive Behaviors in Long-Horizon Interactions
by: Xu, Yang, et al.
Published: (2025)
by: Xu, Yang, et al.
Published: (2025)
Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning
by: Chen, Kang, et al.
Published: (2024)
by: Chen, Kang, et al.
Published: (2024)
What if Deception Cannot be Detected? A Cross-Linguistic Study on the Limits of Deception Detection from Text
by: Velutharambath, Aswathy, et al.
Published: (2025)
by: Velutharambath, Aswathy, et al.
Published: (2025)
OpenDeception: Learning Deception and Trust in Human-AI Interaction via Multi-Agent Simulation
by: Wu, Yichen, et al.
Published: (2025)
by: Wu, Yichen, et al.
Published: (2025)
Can Factual Statements be Deceptive? The DeFaBel Corpus of Belief-based Deception
by: Velutharambath, Aswathy, et al.
Published: (2024)
by: Velutharambath, Aswathy, et al.
Published: (2024)
Dynamic Emotion and Personality Profiling for Multimodal Deception Detection
by: Zheng, Li, et al.
Published: (2026)
by: Zheng, Li, et al.
Published: (2026)
Probing the Limits of the Lie Detector Approach to LLM Deception
by: Berger, Tom-Felix
Published: (2026)
by: Berger, Tom-Felix
Published: (2026)
The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence
by: Wan, Herun, et al.
Published: (2026)
by: Wan, Herun, et al.
Published: (2026)
DECOR: Auditing LLM Deception via Information Manipulation Theory
by: Cai, Linyue, et al.
Published: (2026)
by: Cai, Linyue, et al.
Published: (2026)
LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models
by: Olson, Matthew Lyle, et al.
Published: (2026)
by: Olson, Matthew Lyle, et al.
Published: (2026)
Voting-based Multimodal Automatic Deception Detection
by: Touma, Lana, et al.
Published: (2023)
by: Touma, Lana, et al.
Published: (2023)
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
by: Huang, Yao, et al.
Published: (2025)
by: Huang, Yao, et al.
Published: (2025)
AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models
by: Mukhopadhyay, Snehasis, et al.
Published: (2025)
by: Mukhopadhyay, Snehasis, et al.
Published: (2025)
An Assessment of Model-On-Model Deception
by: Heitkoetter, Julius, et al.
Published: (2024)
by: Heitkoetter, Julius, et al.
Published: (2024)
How Entangled is Factuality and Deception in German?
by: Velutharambath, Aswathy, et al.
Published: (2024)
by: Velutharambath, Aswathy, et al.
Published: (2024)
Detecting Deceptive Dark Patterns in E-commerce Platforms
by: Ramteke, Arya, et al.
Published: (2024)
by: Ramteke, Arya, et al.
Published: (2024)
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models
by: Krishna, Satyapriya, et al.
Published: (2025)
by: Krishna, Satyapriya, et al.
Published: (2025)
Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real-World Detection Challenges
by: Sun, Yanshen, et al.
Published: (2024)
by: Sun, Yanshen, et al.
Published: (2024)
CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation
by: Sinha, Aarush, et al.
Published: (2026)
by: Sinha, Aarush, et al.
Published: (2026)
Deceptive Patterns of Intelligent and Interactive Writing Assistants
by: Benharrak, Karim, et al.
Published: (2024)
by: Benharrak, Karim, et al.
Published: (2024)
Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities of LLMs in Multimodal Settings
by: Miah, Md Messal Monem, et al.
Published: (2025)
by: Miah, Md Messal Monem, et al.
Published: (2025)
Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
by: Wongkamjan, Wichayaporn, et al.
Published: (2025)
by: Wongkamjan, Wichayaporn, et al.
Published: (2025)
SEPSIS: I Can Catch Your Lies -- A New Paradigm for Deception Detection
by: Rani, Anku, et al.
Published: (2023)
by: Rani, Anku, et al.
Published: (2023)
Effects of Soft-Domain Transfer and Named Entity Information on Deception Detection
by: Triplett, Steven, et al.
Published: (2024)
by: Triplett, Steven, et al.
Published: (2024)
Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
by: Kumar, Sachin
Published: (2026)
by: Kumar, Sachin
Published: (2026)
To Tell The Truth: Language of Deception and Language Models
by: Hazra, Sanchaita, et al.
Published: (2023)
by: Hazra, Sanchaita, et al.
Published: (2023)
Lying to Win: Assessing LLM Deception through Human-AI Games and Parallel-World Probing
by: Marioriyad, Arash, et al.
Published: (2026)
by: Marioriyad, Arash, et al.
Published: (2026)
MAiDE-up: Multilingual Deception Detection of GPT-generated Hotel Reviews
by: Ignat, Oana, et al.
Published: (2024)
by: Ignat, Oana, et al.
Published: (2024)
From Deception to Detection: The Dual Roles of Large Language Models in Fake News
by: Sallami, Dorsaf, et al.
Published: (2024)
by: Sallami, Dorsaf, et al.
Published: (2024)
SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence
by: Bu, Yuyan, et al.
Published: (2026)
by: Bu, Yuyan, et al.
Published: (2026)
Domain-Independent Deception: A New Taxonomy and Linguistic Analysis
by: Verma, Rakesh M., et al.
Published: (2024)
by: Verma, Rakesh M., et al.
Published: (2024)
Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing
by: Xie, Jiakuan, et al.
Published: (2025)
by: Xie, Jiakuan, et al.
Published: (2025)
Semantic Deception: When Reasoning Models Can't Compute an Addition
by: de Leeuw, Nathaniël, et al.
Published: (2025)
by: de Leeuw, Nathaniël, et al.
Published: (2025)
Do Large Language Models Exhibit Spontaneous Rational Deception?
by: Taylor, Samuel M., et al.
Published: (2025)
by: Taylor, Samuel M., et al.
Published: (2025)
Deception Abilities Emerged in Large Language Models
by: Hagendorff, Thilo
Published: (2023)
by: Hagendorff, Thilo
Published: (2023)
PU-Lie: Lightweight Deception Detection in Imbalanced Diplomatic Dialogues via Positive-Unlabeled Learning
by: Kuwar, Bhavinkumar Vinodbhai, et al.
Published: (2025)
by: Kuwar, Bhavinkumar Vinodbhai, et al.
Published: (2025)
Dharma, Data and Deception: An LLM-Powered Rhetorical Analysis of Cow-Urine Health Claims on YouTube
by: Munir, Sheza, et al.
Published: (2026)
by: Munir, Sheza, et al.
Published: (2026)
Similar Items
-
PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI
by: Mukhopadhyay, Snehasis
Published: (2026) -
Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems
by: Lermen, Simon, et al.
Published: (2025) -
AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents
by: Rassul, Yassin H., et al.
Published: (2026) -
LH-Deception: Simulating and Understanding LLM Deceptive Behaviors in Long-Horizon Interactions
by: Xu, Yang, et al.
Published: (2025) -
Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning
by: Chen, Kang, et al.
Published: (2024)