:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jafari, Nazanin, Allan, James, Sarwar, Sheikh Muhammad
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2403.19836
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Robust Claim Verification Through Fact Detection
by: Jafari, Nazanin, et al.
Published: (2024)

Beyond Precision: Importance-Aware Recall for Factuality Evaluation in Long-Form LLM Generation
by: Jafari, Nazanin, et al.
Published: (2026)

Harmful Suicide Content Detection
by: Park, Kyumin, et al.
Published: (2024)

LLM-based Semantic Augmentation for Harmful Content Detection
by: Meguellati, Elyas, et al.
Published: (2025)

Beyond Accuracy: An Explainability-Driven Analysis of Harmful Content Detection
by: Dhara, Trishita, et al.
Published: (2026)

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
by: Liu, Kangwei, et al.
Published: (2025)

GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization
by: Tolstykh, Irina, et al.
Published: (2024)

Rewrite to Jailbreak: Discover Learnable and Transferable Implicit Harmfulness Instruction
by: Huang, Yuting, et al.
Published: (2025)

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue
by: Park, Jihyung, et al.
Published: (2026)

Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning
by: Zhang, Rufan, et al.
Published: (2025)

DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects
by: Lucas, Jason, et al.
Published: (2026)

Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection
by: Almohaimeed, Saad, et al.
Published: (2025)

Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
by: Bianchi, Federico, et al.
Published: (2024)

Span Modeling for Idiomaticity and Figurative Language Detection with Span Contrastive Loss
by: Matheny, Blake, et al.
Published: (2026)

The Hidden Language of Harm: Examining the Role of Emojis in Harmful Online Communication and Content Moderation
by: Zhou, Yuhang, et al.
Published: (2025)

StopHC: A Harmful Content Detection and Mitigation Architecture for Social Media Platforms
by: Truică, Ciprian-Octavian, et al.
Published: (2024)

Semantic Search as Extractive Paraphrase Span Detection
by: Kanerva, Jenna, et al.
Published: (2021)

Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms
by: Oak, Rajvardhan, et al.
Published: (2025)

STATE ToxiCN: A Benchmark for Span-level Target-Aware Toxicity Extraction in Chinese Hate Speech Detection
by: Bai, Zewen, et al.
Published: (2025)

Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System
by: Akben, Mustafa, et al.
Published: (2025)

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
by: Zhang, Chi, et al.
Published: (2025)

FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
by: Sarwar, Nobin
Published: (2025)

Span-Level Hallucination Detection for LLM-Generated Answers
by: Elchafei, Passant, et al.
Published: (2025)

GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection
by: Rad, Melissa Kazemi, et al.
Published: (2025)

Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning
by: Maggini, Michele Joshua, et al.
Published: (2025)

Something Just Like TRuST : Toxicity Recognition of Span and Target
by: Atil, Berk, et al.
Published: (2025)

Detection and Analysis of Offensive Online Content in Hausa Language
by: Adam, Fatima Muhammad, et al.
Published: (2023)

Explainable Semantic Textual Similarity via Dissimilar Span Detection
by: Lozano, Diego Miguel, et al.
Published: (2026)

Learning to Reason for Hallucination Span Detection
by: Su, Hsuan, et al.
Published: (2025)

Dont Add, dont Miss: Effective Content Preserving Generation from Pre-Selected Text Spans
by: Slobodkin, Aviv, et al.
Published: (2023)

From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring
by: Li, Yang, et al.
Published: (2025)

Towards Comprehensive Detection of Chinese Harmful Memes
by: Lu, Junyu, et al.
Published: (2024)

An Evaluation of LLMs for Detecting Harmful Computing Terms
by: Jacas, Joshua, et al.
Published: (2025)

Natural Language Decompositions of Implicit Content Enable Better Text Representations
by: Hoyle, Alexander, et al.
Published: (2023)

Towards Low-Resource Harmful Meme Detection with LMM Agents
by: Huang, Jianzhao, et al.
Published: (2024)

Improving Harmful Text Detection with Joint Retrieval and External Knowledge
by: Yu, Zidong, et al.
Published: (2025)

Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
by: Mendu, Sai Krishna, et al.
Published: (2025)

Misinformation Span Detection in Videos via Audio Transcripts
by: Matos, Breno, et al.
Published: (2026)

MisSpans: Fine-Grained False Span Identification in Cross-Domain Fake News
by: Liu, Zhiwei, et al.
Published: (2026)

ViToSA: Audio-Based Toxic Spans Detection on Vietnamese Speech Utterances
by: Do, Huy Ba, et al.
Published: (2025)