:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Ravindran, Santhosh Kumar
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2507.09406
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers
von: KumarRavindran, Santhosh
Veröffentlicht: (2025)

SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
von: S, Santhosh G, et al.
Veröffentlicht: (2025)

AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs
von: S, Santhosh G, et al.
Veröffentlicht: (2025)

Moral Anchor System: A Predictive Framework for AI Value Alignment and Drift Prevention
von: Ravindran, Santhosh Kumar
Veröffentlicht: (2025)

PatchTrAD: A Patch-Based Transformer focusing on Patch-Wise Reconstruction Error for Time Series Anomaly Detection
von: Vilhes, Samy-Melwan, et al.
Veröffentlicht: (2025)

Vulnerability Mitigation for Safety-Aligned Language Models via Debiasing
von: Tran, Thien Q., et al.
Veröffentlicht: (2025)

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors
von: Chaudhary, Maheep, et al.
Veröffentlicht: (2025)

OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories
von: Burnwal, Returaj, et al.
Veröffentlicht: (2026)

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment
von: Verma, Richa, et al.
Veröffentlicht: (2026)

Stable Reasoning, Unstable Responses: Mitigating LLM Deception via Stability Asymmetry
von: Zhang, Guoxi, et al.
Veröffentlicht: (2026)

Adversarial Reward Auditing for Active Detection and Mitigation of Reward Hacking
von: Beigi, Mohammad, et al.
Veröffentlicht: (2026)

Mitigating Overconfidence in Out-of-Distribution Detection by Capturing Extreme Activations
von: Azizmalayeri, Mohammad, et al.
Veröffentlicht: (2024)

Among Us: A Sandbox for Measuring and Detecting Agentic Deception
von: Golechha, Satvik, et al.
Veröffentlicht: (2025)

Patch-Level Tokenization with CNN Encoders and Attention for Improved Transformer Time-Series Forecasting
von: Nagrath, Saurish, et al.
Veröffentlicht: (2026)

PatchCTG: Patch Cardiotocography Transformer for Antepartum Fetal Health Monitoring
von: Khan, M. Jaleed, et al.
Veröffentlicht: (2024)

CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation
von: Ravindran, Santhosh Kumar
Veröffentlicht: (2025)

Value of Information-based Deceptive Path Planning Under Adversarial Interventions
von: Suttle, Wesley A., et al.
Veröffentlicht: (2025)

CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation
von: Ravindran, Santhosh Kumar
Veröffentlicht: (2025)

Mitigating Over-Refusal in Aligned Large Language Models via Inference-Time Activation Energy
von: Jiang, Eric Hanchen, et al.
Veröffentlicht: (2025)

TopoReformer: Mitigating Adversarial Attacks Using Topological Purification in OCR Models
von: Kumar, Bhagyesh, et al.
Veröffentlicht: (2025)

Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs
von: Kaliaperumal, Pranav Kumar
Veröffentlicht: (2026)

Emergent Low-Rank Training Dynamics in MLPs with Smooth Activations
von: Xu, Alec S., et al.
Veröffentlicht: (2026)

Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
von: Zhang, Jiawen, et al.
Veröffentlicht: (2026)

Portable Agent Memory: A Protocol for Cryptographically-Verified Memory Transfer Across Heterogeneous AI Agents
von: Ravindran, Santhosh Kumar
Veröffentlicht: (2026)

Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics
von: Acharjee, Jashaswimalya, et al.
Veröffentlicht: (2026)

Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training
von: Wang, Yisen, et al.
Veröffentlicht: (2025)

Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models
von: Chen, Kejia, et al.
Veröffentlicht: (2025)

Dual Turing Test: A Framework for Detecting and Mitigating Undetectable AI
von: Messina, Alberto
Veröffentlicht: (2025)

Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
von: Mamtani, Sumit, et al.
Veröffentlicht: (2025)

Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks
von: Panda, Deepak Kumar, et al.
Veröffentlicht: (2025)

Measuring the Depth of LLM Unlearning via Activation Patching
von: Lee, Jaeung, et al.
Veröffentlicht: (2026)

A Hybrid Model for Traffic Incident Detection based on Generative Adversarial Networks and Transformer Model
von: Lu, Xinying, et al.
Veröffentlicht: (2024)

Deceptive Exploration in Multi-armed Bandits
von: Vurankaya, I. Arda, et al.
Veröffentlicht: (2025)

Adversarially Robust Decision Transformer
von: Tang, Xiaohang, et al.
Veröffentlicht: (2024)

Localized Definitions and Distributed Reasoning: A Proof-of-Concept Mechanistic Interpretability Study via Activation Patching
von: Bahador, Nooshin
Veröffentlicht: (2025)

Read, Extract, Classify: A Tool for Smarter Requirements Engineering
von: Bhattacharya, Paheli, et al.
Veröffentlicht: (2026)

DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
von: Huang, Yao, et al.
Veröffentlicht: (2025)

Deceptive Diffusion: Generating Synthetic Adversarial Examples
von: Beerens, Lucas, et al.
Veröffentlicht: (2024)

TriP-LLM: A Tri-Branch Patch-wise Large Language Model Framework for Time-Series Anomaly Detection
von: Yu, Yuan-Cheng, et al.
Veröffentlicht: (2025)

Generalized Adaptive Transfer Network: Enhancing Transfer Learning in Reinforcement Learning Across Domains
von: Verma, Abhishek, et al.
Veröffentlicht: (2025)