Saved in:
| Main Authors: | Ganguli, Rajesh, Moraffah, Raha |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.11663 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Generative Approach to Surrogate-based Black-box Attacks
by: Moraffah, Raha, et al.
Published: (2024)
by: Moraffah, Raha, et al.
Published: (2024)
Can Large Language Models Infer Causal Relationships from Real-World Text?
by: Saklad, Ryan, et al.
Published: (2025)
by: Saklad, Ryan, et al.
Published: (2025)
Adversarial Text Purification: A Large Language Model Approach for Defense
by: Moraffah, Raha, et al.
Published: (2024)
by: Moraffah, Raha, et al.
Published: (2024)
Exploiting Class Probabilities for Black-box Sentence-level Attacks
by: Moraffah, Raha, et al.
Published: (2024)
by: Moraffah, Raha, et al.
Published: (2024)
Zero-shot LLM-guided Counterfactual Generation: A Case Study on NLP Model Evaluation
by: Bhattacharjee, Amrita, et al.
Published: (2024)
by: Bhattacharjee, Amrita, et al.
Published: (2024)
EAGLE: A Domain Generalization Framework for AI-generated Text Detection
by: Bhattacharjee, Amrita, et al.
Published: (2024)
by: Bhattacharjee, Amrita, et al.
Published: (2024)
Causal Feature Selection for Responsible Machine Learning
by: Moraffah, Raha, et al.
Published: (2024)
by: Moraffah, Raha, et al.
Published: (2024)
Towards LLM-guided Causal Explainability for Black-box Text Classifiers
by: Bhattacharjee, Amrita, et al.
Published: (2023)
by: Bhattacharjee, Amrita, et al.
Published: (2023)
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models
by: Tan, Zhen, et al.
Published: (2024)
by: Tan, Zhen, et al.
Published: (2024)
Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
by: Bianchi, Federico, et al.
Published: (2024)
by: Bianchi, Federico, et al.
Published: (2024)
Prefix Probing: Lightweight Harmful Content Detection for Large Language Models
by: Yang, Jirui, et al.
Published: (2025)
by: Yang, Jirui, et al.
Published: (2025)
DAGverse: Building Document-Grounded Semantic DAGs from Scientific Papers
by: Wan, Shu, et al.
Published: (2026)
by: Wan, Shu, et al.
Published: (2026)
A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
by: Kumarage, Tharindu, et al.
Published: (2024)
by: Kumarage, Tharindu, et al.
Published: (2024)
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
by: Orgad, Hadas, et al.
Published: (2026)
by: Orgad, Hadas, et al.
Published: (2026)
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative
by: Tan, Zhen, et al.
Published: (2024)
by: Tan, Zhen, et al.
Published: (2024)
Self-HarmLLM: Can Large Language Model Harm Itself?
by: Kim, Heehwan, et al.
Published: (2025)
by: Kim, Heehwan, et al.
Published: (2025)
Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System
by: Akben, Mustafa, et al.
Published: (2025)
by: Akben, Mustafa, et al.
Published: (2025)
AI Meets the Classroom: When Do Large Language Models Harm Learning?
by: Lehmann, Matthias, et al.
Published: (2024)
by: Lehmann, Matthias, et al.
Published: (2024)
Why Do Language Model Agents Whistleblow?
by: Agrawal, Kushal, et al.
Published: (2025)
by: Agrawal, Kushal, et al.
Published: (2025)
Do Large Language Models Need a Content Delivery Network?
by: Cheng, Yihua, et al.
Published: (2024)
by: Cheng, Yihua, et al.
Published: (2024)
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
by: Huang, Tiansheng, et al.
Published: (2024)
by: Huang, Tiansheng, et al.
Published: (2024)
TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale
by: Ganguli, Anurup
Published: (2026)
by: Ganguli, Anurup
Published: (2026)
Engagement-Driven Content Generation with Large Language Models
by: Coppolillo, Erica, et al.
Published: (2024)
by: Coppolillo, Erica, et al.
Published: (2024)
Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms
by: Oak, Rajvardhan, et al.
Published: (2025)
by: Oak, Rajvardhan, et al.
Published: (2025)
HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?
by: Jiang, Yukun, et al.
Published: (2026)
by: Jiang, Yukun, et al.
Published: (2026)
When Harmful Content Gets Camouflaged: Unveiling Perception Failure of LVLMs with CamHarmTI
by: Li, Yanhui, et al.
Published: (2025)
by: Li, Yanhui, et al.
Published: (2025)
Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink
by: Liu, Guozhi, et al.
Published: (2026)
by: Liu, Guozhi, et al.
Published: (2026)
Guiding Large Language Models to Generate Computer-Parsable Content
by: Wang, Jiaye
Published: (2024)
by: Wang, Jiaye
Published: (2024)
Bias of AI-Generated Content: An Examination of News Produced by Large Language Models
by: Fang, Xiao, et al.
Published: (2023)
by: Fang, Xiao, et al.
Published: (2023)
Why Larger Language Models Do In-context Learning Differently?
by: Shi, Zhenmei, et al.
Published: (2024)
by: Shi, Zhenmei, et al.
Published: (2024)
Evaluating Language Models for Harmful Manipulation
by: Akbulut, Canfer, et al.
Published: (2026)
by: Akbulut, Canfer, et al.
Published: (2026)
Harmful Suicide Content Detection
by: Park, Kyumin, et al.
Published: (2024)
by: Park, Kyumin, et al.
Published: (2024)
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks
by: Pimentel, Marco AF, et al.
Published: (2024)
by: Pimentel, Marco AF, et al.
Published: (2024)
Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models
by: Yang, Fan
Published: (2025)
by: Yang, Fan
Published: (2025)
LLM Safety From Within: Detecting Harmful Content with Internal Representations
by: Jiao, Difan, et al.
Published: (2026)
by: Jiao, Difan, et al.
Published: (2026)
ChatPCG: Large Language Model-Driven Reward Design for Procedural Content Generation
by: Baek, In-Chang, et al.
Published: (2024)
by: Baek, In-Chang, et al.
Published: (2024)
Why Do Vision Language Models Struggle To Recognize Human Emotions?
by: Agarwal, Madhav, et al.
Published: (2026)
by: Agarwal, Madhav, et al.
Published: (2026)
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
by: Kang, Deokhyung, et al.
Published: (2025)
by: Kang, Deokhyung, et al.
Published: (2025)
A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?
by: Chen, QiHong, et al.
Published: (2024)
by: Chen, QiHong, et al.
Published: (2024)
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
by: Chen, Zixin, et al.
Published: (2025)
by: Chen, Zixin, et al.
Published: (2025)
Similar Items
-
A Generative Approach to Surrogate-based Black-box Attacks
by: Moraffah, Raha, et al.
Published: (2024) -
Can Large Language Models Infer Causal Relationships from Real-World Text?
by: Saklad, Ryan, et al.
Published: (2025) -
Adversarial Text Purification: A Large Language Model Approach for Defense
by: Moraffah, Raha, et al.
Published: (2024) -
Exploiting Class Probabilities for Black-box Sentence-level Attacks
by: Moraffah, Raha, et al.
Published: (2024) -
Zero-shot LLM-guided Counterfactual Generation: A Case Study on NLP Model Evaluation
by: Bhattacharjee, Amrita, et al.
Published: (2024)