Saved in:
| Main Authors: | Li, Yang, Sheng, Qiang, Yang, Yehan, Zhang, Xueyao, Cao, Juan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.09996 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning
by: Shi, Yuhui, et al.
Published: (2025)
by: Shi, Yuhui, et al.
Published: (2025)
PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
by: Li, Jing-Jing, et al.
Published: (2026)
by: Li, Jing-Jing, et al.
Published: (2026)
LLM-based Semantic Augmentation for Harmful Content Detection
by: Meguellati, Elyas, et al.
Published: (2025)
by: Meguellati, Elyas, et al.
Published: (2025)
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation
by: Hu, Beizhe, et al.
Published: (2025)
by: Hu, Beizhe, et al.
Published: (2025)
Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection
by: Li, Yang, et al.
Published: (2026)
by: Li, Yang, et al.
Published: (2026)
Exploiting User Comments for Early Detection of Fake News Prior to Users' Commenting
by: Nan, Qiong, et al.
Published: (2023)
by: Nan, Qiong, et al.
Published: (2023)
The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas
by: Wu, Ya, et al.
Published: (2025)
by: Wu, Ya, et al.
Published: (2025)
The Hidden Language of Harm: Examining the Role of Emojis in Harmful Online Communication and Content Moderation
by: Zhou, Yuhang, et al.
Published: (2025)
by: Zhou, Yuhang, et al.
Published: (2025)
Longitudinal Monitoring of LLM Content Moderation of Social Issues
by: Dai, Yunlang, et al.
Published: (2025)
by: Dai, Yunlang, et al.
Published: (2025)
Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems
by: Harvey, Emma, et al.
Published: (2025)
by: Harvey, Emma, et al.
Published: (2025)
Harmful Suicide Content Detection
by: Park, Kyumin, et al.
Published: (2024)
by: Park, Kyumin, et al.
Published: (2024)
Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection
by: Hu, Beizhe, et al.
Published: (2023)
by: Hu, Beizhe, et al.
Published: (2023)
GLARE: Agentic Reasoning for Legal Judgment Prediction
by: Yang, Xinyu, et al.
Published: (2025)
by: Yang, Xinyu, et al.
Published: (2025)
Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework
by: Gajewska, Ewelina, et al.
Published: (2026)
by: Gajewska, Ewelina, et al.
Published: (2026)
Taxonomizing Representational Harms using Speech Act Theory
by: Corvi, Emily, et al.
Published: (2025)
by: Corvi, Emily, et al.
Published: (2025)
AppellateGen: A Benchmark for Appellate Legal Judgment Generation
by: Yang, Hongkun, et al.
Published: (2026)
by: Yang, Hongkun, et al.
Published: (2026)
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
by: Chehbouni, Khaoula, et al.
Published: (2024)
by: Chehbouni, Khaoula, et al.
Published: (2024)
Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment
by: Sauter, Adrian, et al.
Published: (2026)
by: Sauter, Adrian, et al.
Published: (2026)
Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework
by: Jain, Shomik, et al.
Published: (2025)
by: Jain, Shomik, et al.
Published: (2025)
Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models
by: Nan, Qiong, et al.
Published: (2024)
by: Nan, Qiong, et al.
Published: (2024)
Exploring news intent and its application: A theory-driven approach
by: Wang, Zhengjia, et al.
Published: (2023)
by: Wang, Zhengjia, et al.
Published: (2023)
Legal Fact Prediction: The Missing Piece in Legal Judgment Prediction
by: Liu, Junkai, et al.
Published: (2024)
by: Liu, Junkai, et al.
Published: (2024)
People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection
by: Sen, Indira, et al.
Published: (2023)
by: Sen, Indira, et al.
Published: (2023)
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments
by: Mi, Hao, et al.
Published: (2026)
by: Mi, Hao, et al.
Published: (2026)
From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents
by: Yu, Jifan, et al.
Published: (2024)
by: Yu, Jifan, et al.
Published: (2024)
JurisCTC: Enhancing Legal Judgment Prediction via Cross-Domain Transfer and Contrastive Learning
by: Kang, Zhaolu, et al.
Published: (2025)
by: Kang, Zhaolu, et al.
Published: (2025)
Careless Whisper: Speech-to-Text Hallucination Harms
by: Koenecke, Allison, et al.
Published: (2024)
by: Koenecke, Allison, et al.
Published: (2024)
Disentangling Learning from Judgment: Representation Learning for Open Response Analytics
by: Borchers, Conrad, et al.
Published: (2025)
by: Borchers, Conrad, et al.
Published: (2025)
Language of Thought Shapes Output Diversity in Large Language Models
by: Xu, Shaoyang, et al.
Published: (2026)
by: Xu, Shaoyang, et al.
Published: (2026)
On the Sensitivity of Instruction-tuned LLMs to Harmful Sentences in Long Inputs
by: Ghorbanpour, Faeze, et al.
Published: (2025)
by: Ghorbanpour, Faeze, et al.
Published: (2025)
A Capabilities Approach to Studying Bias and Harm in Language Technologies
by: Nigatu, Hellina Hailu, et al.
Published: (2024)
by: Nigatu, Hellina Hailu, et al.
Published: (2024)
Multilingualism, Transnationality, and K-pop in the Online #StopAsianHate Movement
by: Masis, Tessa, et al.
Published: (2025)
by: Masis, Tessa, et al.
Published: (2025)
Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge
by: Cai, Yunna, et al.
Published: (2025)
by: Cai, Yunna, et al.
Published: (2025)
Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs
by: Arnaiz-Rodriguez, Adrian, et al.
Published: (2025)
by: Arnaiz-Rodriguez, Adrian, et al.
Published: (2025)
Do Prevalent Bias Metrics Capture Allocational Harms from LLMs?
by: Cyberey, Hannah, et al.
Published: (2024)
by: Cyberey, Hannah, et al.
Published: (2024)
Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once
by: Dhingra, Harnoor
Published: (2026)
by: Dhingra, Harnoor
Published: (2026)
SESGO: Spanish Evaluation of Stereotypical Generative Outputs
by: Robles, Melissa, et al.
Published: (2025)
by: Robles, Melissa, et al.
Published: (2025)
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
by: Deng, Wenlong, et al.
Published: (2024)
by: Deng, Wenlong, et al.
Published: (2024)
From Hard Refusals to Safe-Completions: Toward Output-Centric Safety Training
by: Yuan, Yuan, et al.
Published: (2025)
by: Yuan, Yuan, et al.
Published: (2025)
Similar Items
-
PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning
by: Shi, Yuhui, et al.
Published: (2025) -
PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
by: Li, Jing-Jing, et al.
Published: (2026) -
LLM-based Semantic Augmentation for Harmful Content Detection
by: Meguellati, Elyas, et al.
Published: (2025) -
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM
by: Zhang, Chi, et al.
Published: (2025) -
LLM-Generated Fake News Induces Truth Decay in News Ecosystem: A Case Study on Neural News Recommendation
by: Hu, Beizhe, et al.
Published: (2025)