:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ganguli, Rajesh, Moraffah, Raha
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.11663
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Generative Approach to Surrogate-based Black-box Attacks
by: Moraffah, Raha, et al.
Published: (2024)

Can Large Language Models Infer Causal Relationships from Real-World Text?
by: Saklad, Ryan, et al.
Published: (2025)

Adversarial Text Purification: A Large Language Model Approach for Defense
by: Moraffah, Raha, et al.
Published: (2024)

Exploiting Class Probabilities for Black-box Sentence-level Attacks
by: Moraffah, Raha, et al.
Published: (2024)

Zero-shot LLM-guided Counterfactual Generation: A Case Study on NLP Model Evaluation
by: Bhattacharjee, Amrita, et al.
Published: (2024)

EAGLE: A Domain Generalization Framework for AI-generated Text Detection
by: Bhattacharjee, Amrita, et al.
Published: (2024)

Causal Feature Selection for Responsible Machine Learning
by: Moraffah, Raha, et al.
Published: (2024)

Towards LLM-guided Causal Explainability for Black-box Text Classifiers
by: Bhattacharjee, Amrita, et al.
Published: (2023)

"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models
by: Tan, Zhen, et al.
Published: (2024)

Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
by: Bianchi, Federico, et al.
Published: (2024)

Prefix Probing: Lightweight Harmful Content Detection for Large Language Models
by: Yang, Jirui, et al.
Published: (2025)

DAGverse: Building Document-Grounded Semantic DAGs from Scientific Papers
by: Wan, Shu, et al.
Published: (2026)

A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
by: Kumarage, Tharindu, et al.
Published: (2024)

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
by: Orgad, Hadas, et al.
Published: (2026)

The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative
by: Tan, Zhen, et al.
Published: (2024)

Self-HarmLLM: Can Large Language Model Harm Itself?
by: Kim, Heehwan, et al.
Published: (2025)

Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System
by: Akben, Mustafa, et al.
Published: (2025)

AI Meets the Classroom: When Do Large Language Models Harm Learning?
by: Lehmann, Matthias, et al.
Published: (2024)

Why Do Language Model Agents Whistleblow?
by: Agrawal, Kushal, et al.
Published: (2025)

Do Large Language Models Need a Content Delivery Network?
by: Cheng, Yihua, et al.
Published: (2024)

Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
by: Huang, Tiansheng, et al.
Published: (2024)

TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale
by: Ganguli, Anurup
Published: (2026)

Engagement-Driven Content Generation with Large Language Models
by: Coppolillo, Erica, et al.
Published: (2024)

Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms
by: Oak, Rajvardhan, et al.
Published: (2025)

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?
by: Jiang, Yukun, et al.
Published: (2026)

When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI
by: Li, Yanhui, et al.
Published: (2025)

Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink
by: Liu, Guozhi, et al.
Published: (2026)

Guiding Large Language Models to Generate Computer-Parsable Content
by: Wang, Jiaye
Published: (2024)

Bias of AI-Generated Content: An Examination of News Produced by Large Language Models
by: Fang, Xiao, et al.
Published: (2023)

Why Larger Language Models Do In-context Learning Differently?
by: Shi, Zhenmei, et al.
Published: (2024)

Evaluating Language Models for Harmful Manipulation
by: Akbulut, Canfer, et al.
Published: (2026)

Harmful Suicide Content Detection
by: Park, Kyumin, et al.
Published: (2024)

Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks
by: Pimentel, Marco AF, et al.
Published: (2024)

Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models
by: Yang, Fan
Published: (2025)

LLM Safety From Within: Detecting Harmful Content with Internal Representations
by: Jiao, Difan, et al.
Published: (2026)

ChatPCG: Large Language Model-Driven Reward Design for Procedural Content Generation
by: Baek, In-Chang, et al.
Published: (2024)

Why Do Vision Language Models Struggle To Recognize Human Emotions?
by: Agarwal, Madhav, et al.
Published: (2026)

Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
by: Kang, Deokhyung, et al.
Published: (2025)

A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?
by: Chen, QiHong, et al.
Published: (2024)

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
by: Chen, Zixin, et al.
Published: (2025)