:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yang, Sheng, Qiang, Yang, Yehan, Zhang, Xueyao, Cao, Juan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Computers and Society
Online Access:	https://arxiv.org/abs/2506.09996
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning
by: Shi, Yuhui, et al.
Published: (2025)

PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
by: Li, Jing-Jing, et al.
Published: (2026)

LLM-based Semantic Augmentation for Harmful Content Detection
by: Meguellati, Elyas, et al.
Published: (2025)

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
by: Zhang, Chi, et al.
Published: (2025)

LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation
by: Hu, Beizhe, et al.
Published: (2025)

Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection
by: Li, Yang, et al.
Published: (2026)

Exploiting User Comments for Early Detection of Fake News Prior to Users' Commenting
by: Nan, Qiong, et al.
Published: (2023)

The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas
by: Wu, Ya, et al.
Published: (2025)

The Hidden Language of Harm: Examining the Role of Emojis in Harmful Online Communication and Content Moderation
by: Zhou, Yuhang, et al.
Published: (2025)

Longitudinal Monitoring of LLM Content Moderation of Social Issues
by: Dai, Yunlang, et al.
Published: (2025)

Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems
by: Harvey, Emma, et al.
Published: (2025)

Harmful Suicide Content Detection
by: Park, Kyumin, et al.
Published: (2024)

Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection
by: Hu, Beizhe, et al.
Published: (2023)

GLARE: Agentic Reasoning for Legal Judgment Prediction
by: Yang, Xinyu, et al.
Published: (2025)

Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework
by: Gajewska, Ewelina, et al.
Published: (2026)

Taxonomizing Representational Harms using Speech Act Theory
by: Corvi, Emily, et al.
Published: (2025)

AppellateGen: A Benchmark for Appellate Legal Judgment Generation
by: Yang, Hongkun, et al.
Published: (2026)

From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
by: Chehbouni, Khaoula, et al.
Published: (2024)

Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment
by: Sauter, Adrian, et al.
Published: (2026)

Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework
by: Jain, Shomik, et al.
Published: (2025)

Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models
by: Nan, Qiong, et al.
Published: (2024)

Exploring news intent and its application: A theory-driven approach
by: Wang, Zhengjia, et al.
Published: (2023)

Legal Fact Prediction: The Missing Piece in Legal Judgment Prediction
by: Liu, Junkai, et al.
Published: (2024)

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection
by: Sen, Indira, et al.
Published: (2023)

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments
by: Mi, Hao, et al.
Published: (2026)

From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents
by: Yu, Jifan, et al.
Published: (2024)

JurisCTC: Enhancing Legal Judgment Prediction via Cross-Domain Transfer and Contrastive Learning
by: Kang, Zhaolu, et al.
Published: (2025)

Careless Whisper: Speech-to-Text Hallucination Harms
by: Koenecke, Allison, et al.
Published: (2024)

Disentangling Learning from Judgment: Representation Learning for Open Response Analytics
by: Borchers, Conrad, et al.
Published: (2025)

Language of Thought Shapes Output Diversity in Large Language Models
by: Xu, Shaoyang, et al.
Published: (2026)

On the Sensitivity of Instruction-tuned LLMs to Harmful Sentences in Long Inputs
by: Ghorbanpour, Faeze, et al.
Published: (2025)

A Capabilities Approach to Studying Bias and Harm in Language Technologies
by: Nigatu, Hellina Hailu, et al.
Published: (2024)

Multilingualism, Transnationality, and K-pop in the Online #StopAsianHate Movement
by: Masis, Tessa, et al.
Published: (2025)

Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge
by: Cai, Yunna, et al.
Published: (2025)

Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs
by: Arnaiz-Rodriguez, Adrian, et al.
Published: (2025)

Do Prevalent Bias Metrics Capture Allocational Harms from LLMs?
by: Cyberey, Hannah, et al.
Published: (2024)

Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once
by: Dhingra, Harnoor
Published: (2026)

SESGO: Spanish Evaluation of Stereotypical Generative Outputs
by: Robles, Melissa, et al.
Published: (2025)

LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
by: Deng, Wenlong, et al.
Published: (2024)

From Hard Refusals to Safe-Completions: Toward Output-Centric Safety Training
by: Yuan, Yuan, et al.
Published: (2025)