Saved in:
| Main Authors: | Jafari, Nazanin, Allan, James, Sarwar, Sheikh Muhammad |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.19836 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Robust Claim Verification Through Fact Detection
by: Jafari, Nazanin, et al.
Published: (2024)
by: Jafari, Nazanin, et al.
Published: (2024)
Beyond Precision: Importance-Aware Recall for Factuality Evaluation in Long-Form LLM Generation
by: Jafari, Nazanin, et al.
Published: (2026)
by: Jafari, Nazanin, et al.
Published: (2026)
Harmful Suicide Content Detection
by: Park, Kyumin, et al.
Published: (2024)
by: Park, Kyumin, et al.
Published: (2024)
LLM-based Semantic Augmentation for Harmful Content Detection
by: Meguellati, Elyas, et al.
Published: (2025)
by: Meguellati, Elyas, et al.
Published: (2025)
Beyond Accuracy: An Explainability-Driven Analysis of Harmful Content Detection
by: Dhara, Trishita, et al.
Published: (2026)
by: Dhara, Trishita, et al.
Published: (2026)
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
by: Liu, Kangwei, et al.
Published: (2025)
by: Liu, Kangwei, et al.
Published: (2025)
GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization
by: Tolstykh, Irina, et al.
Published: (2024)
by: Tolstykh, Irina, et al.
Published: (2024)
Rewrite to Jailbreak: Discover Learnable and Transferable Implicit Harmfulness Instruction
by: Huang, Yuting, et al.
Published: (2025)
by: Huang, Yuting, et al.
Published: (2025)
AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue
by: Park, Jihyung, et al.
Published: (2026)
by: Park, Jihyung, et al.
Published: (2026)
Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning
by: Zhang, Rufan, et al.
Published: (2025)
by: Zhang, Rufan, et al.
Published: (2025)
DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects
by: Lucas, Jason, et al.
Published: (2026)
by: Lucas, Jason, et al.
Published: (2026)
Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection
by: Almohaimeed, Saad, et al.
Published: (2025)
by: Almohaimeed, Saad, et al.
Published: (2025)
Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
by: Bianchi, Federico, et al.
Published: (2024)
by: Bianchi, Federico, et al.
Published: (2024)
Span Modeling for Idiomaticity and Figurative Language Detection with Span Contrastive Loss
by: Matheny, Blake, et al.
Published: (2026)
by: Matheny, Blake, et al.
Published: (2026)
The Hidden Language of Harm: Examining the Role of Emojis in Harmful Online Communication and Content Moderation
by: Zhou, Yuhang, et al.
Published: (2025)
by: Zhou, Yuhang, et al.
Published: (2025)
StopHC: A Harmful Content Detection and Mitigation Architecture for Social Media Platforms
by: Truică, Ciprian-Octavian, et al.
Published: (2024)
by: Truică, Ciprian-Octavian, et al.
Published: (2024)
Semantic Search as Extractive Paraphrase Span Detection
by: Kanerva, Jenna, et al.
Published: (2021)
by: Kanerva, Jenna, et al.
Published: (2021)
Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms
by: Oak, Rajvardhan, et al.
Published: (2025)
by: Oak, Rajvardhan, et al.
Published: (2025)
STATE ToxiCN: A Benchmark for Span-level Target-Aware Toxicity Extraction in Chinese Hate Speech Detection
by: Bai, Zewen, et al.
Published: (2025)
by: Bai, Zewen, et al.
Published: (2025)
Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System
by: Akben, Mustafa, et al.
Published: (2025)
by: Akben, Mustafa, et al.
Published: (2025)
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
by: Sarwar, Nobin
Published: (2025)
by: Sarwar, Nobin
Published: (2025)
Span-Level Hallucination Detection for LLM-Generated Answers
by: Elchafei, Passant, et al.
Published: (2025)
by: Elchafei, Passant, et al.
Published: (2025)
GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection
by: Rad, Melissa Kazemi, et al.
Published: (2025)
by: Rad, Melissa Kazemi, et al.
Published: (2025)
Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning
by: Maggini, Michele Joshua, et al.
Published: (2025)
by: Maggini, Michele Joshua, et al.
Published: (2025)
Something Just Like TRuST : Toxicity Recognition of Span and Target
by: Atil, Berk, et al.
Published: (2025)
by: Atil, Berk, et al.
Published: (2025)
Detection and Analysis of Offensive Online Content in Hausa Language
by: Adam, Fatima Muhammad, et al.
Published: (2023)
by: Adam, Fatima Muhammad, et al.
Published: (2023)
Explainable Semantic Textual Similarity via Dissimilar Span Detection
by: Lozano, Diego Miguel, et al.
Published: (2026)
by: Lozano, Diego Miguel, et al.
Published: (2026)
Learning to Reason for Hallucination Span Detection
by: Su, Hsuan, et al.
Published: (2025)
by: Su, Hsuan, et al.
Published: (2025)
Dont Add, dont Miss: Effective Content Preserving Generation from Pre-Selected Text Spans
by: Slobodkin, Aviv, et al.
Published: (2023)
by: Slobodkin, Aviv, et al.
Published: (2023)
From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring
by: Li, Yang, et al.
Published: (2025)
by: Li, Yang, et al.
Published: (2025)
Towards Comprehensive Detection of Chinese Harmful Memes
by: Lu, Junyu, et al.
Published: (2024)
by: Lu, Junyu, et al.
Published: (2024)
An Evaluation of LLMs for Detecting Harmful Computing Terms
by: Jacas, Joshua, et al.
Published: (2025)
by: Jacas, Joshua, et al.
Published: (2025)
Natural Language Decompositions of Implicit Content Enable Better Text Representations
by: Hoyle, Alexander, et al.
Published: (2023)
by: Hoyle, Alexander, et al.
Published: (2023)
Towards Low-Resource Harmful Meme Detection with LMM Agents
by: Huang, Jianzhao, et al.
Published: (2024)
by: Huang, Jianzhao, et al.
Published: (2024)
Improving Harmful Text Detection with Joint Retrieval and External Knowledge
by: Yu, Zidong, et al.
Published: (2025)
by: Yu, Zidong, et al.
Published: (2025)
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
by: Mendu, Sai Krishna, et al.
Published: (2025)
by: Mendu, Sai Krishna, et al.
Published: (2025)
Misinformation Span Detection in Videos via Audio Transcripts
by: Matos, Breno, et al.
Published: (2026)
by: Matos, Breno, et al.
Published: (2026)
MisSpans: Fine-Grained False Span Identification in Cross-Domain Fake News
by: Liu, Zhiwei, et al.
Published: (2026)
by: Liu, Zhiwei, et al.
Published: (2026)
ViToSA: Audio-Based Toxic Spans Detection on Vietnamese Speech Utterances
by: Do, Huy Ba, et al.
Published: (2025)
by: Do, Huy Ba, et al.
Published: (2025)
Similar Items
-
Robust Claim Verification Through Fact Detection
by: Jafari, Nazanin, et al.
Published: (2024) -
Beyond Precision: Importance-Aware Recall for Factuality Evaluation in Long-Form LLM Generation
by: Jafari, Nazanin, et al.
Published: (2026) -
Harmful Suicide Content Detection
by: Park, Kyumin, et al.
Published: (2024) -
LLM-based Semantic Augmentation for Harmful Content Detection
by: Meguellati, Elyas, et al.
Published: (2025) -
Beyond Accuracy: An Explainability-Driven Analysis of Harmful Content Detection
by: Dhara, Trishita, et al.
Published: (2026)