:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lu, Junyu, Ma, Kai, Wang, Kaichun, Xiao, Kelaiti, Lee, Roy Ka-Wei, Xu, Bo, Yang, Liang, Lin, Hongfei
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.06207
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework
by: Xiao, Kelaiti, et al.
Published: (2025)

Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis
by: Lu, Junyu, et al.
Published: (2026)

VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs
by: Xiao, Kelaiti, et al.
Published: (2025)

ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations
by: Xiao, Yunze, et al.
Published: (2024)

Multi-Agent VLMs Guided Self-Training with PNU Loss for Low-Resource Offensive Content Detection
by: Wang, Han, et al.
Published: (2025)

Harder to Defend: Towards Chinese Toxicity Attacks via Implicit Enhancement and Obfuscation Rewriting
by: Kang, Jingyi, et al.
Published: (2026)

Towards Patronizing and Condescending Language in Chinese Videos: A Multimodal Dataset and Detector
by: Wang, Hongbo, et al.
Published: (2024)

PclGPT: A Large Language Model for Patronizing and Condescending Language Detection
by: Wang, Hongbo, et al.
Published: (2024)

Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution
by: Tian, Zailong, et al.
Published: (2025)

Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect
by: Lu, Junyu, et al.
Published: (2024)

Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive
by: Weerasooriya, Tharindu Cyril, et al.
Published: (2023)

D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation
by: Davani, Aida Mostafazadeh, et al.
Published: (2024)

Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection
by: Zhu, Haohao, et al.
Published: (2024)

When Disagreements Elicit Robustness: Investigating Self-Repair Capabilities under LLM Multi-Agent Disagreements
by: Ju, Tianjie, et al.
Published: (2025)

From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs
by: Niu, Minxue, et al.
Published: (2024)

Guardians of Discourse: Evaluating LLMs on Multilingual Offensive Language Detection
by: He, Jianfei, et al.
Published: (2024)

Towards Comprehensive Detection of Chinese Harmful Memes
by: Lu, Junyu, et al.
Published: (2024)

HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection
by: Wang, Han, et al.
Published: (2025)

Chinese Offensive Language Detection:Current Status and Future Directions
by: Xiao, Yunze, et al.
Published: (2024)

Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
by: Shi, Lin, et al.
Published: (2024)

With Great Capabilities Come Great Responsibilities: Introducing the Agentic Risk & Capability Framework for Governing Agentic AI Systems
by: Khoo, Shaun, et al.
Published: (2025)

Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs
by: Pihulski, Dzmitry, et al.
Published: (2025)

OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
by: Kouremetis, Michael, et al.
Published: (2025)

Evaluating Annotation Consistency in Offensive Language Detection: A Data Analytics Approach on the TweetEval Dataset
by: Fabeela Ali Rawther,Abhinay A K,Anagha Tess B,Alan Joseph,Adham Saheer
Published: (2025)

Leveraging Annotator Disagreement for Text Classification
by: Xu, Jin, et al.
Published: (2024)

Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks
by: Merves, Tyler H., et al.
Published: (2026)

Taming Overconfidence in LLMs: Reward Calibration in RLHF
by: Leng, Jixuan, et al.
Published: (2024)

Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs
by: Xu, Chenjun, et al.
Published: (2025)

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
by: Calderon, Nitay, et al.
Published: (2025)

Heterogeneous Judge-Aware Ranking with Sensitivity, Disagreement, and Confidence
by: Yu, Shibo, et al.
Published: (2026)

Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark
by: Shao, Minghao, et al.
Published: (2025)

MemGuard-Alpha: Detecting and Filtering Memorization-Contaminated Signals in LLM-Based Financial Forecasting via Membership Inference and Cross-Model Disagreement
by: Roy, Anisha, et al.
Published: (2026)

Same Verdict, Different Reasons: LLM-as-a-Judge and Clinician Disagreement on Medical Chatbot Completeness
by: DeLucia, Alexandra, et al.
Published: (2026)

Calibrating Probabilistic Object Detectors with Annotator Disagreement
by: Tan, Zhi Qin, et al.
Published: (2026)

Dealing with Annotator Disagreement in Hate Speech Classification
by: Dehghan, Somaiyeh, et al.
Published: (2025)

Function-based Labels for Complementary Recommendation: Definition, Annotation, and LLM-as-a-Judge
by: Yamasaki, Chihiro, et al.
Published: (2025)

Enhancing Textual Personality Detection toward Social Media: Integrating Long-term and Short-term Perspectives
by: Zhu, Haohao, et al.
Published: (2024)

Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes
by: Wang, Weiming, et al.
Published: (2026)

Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models
by: Yuxuan, Cao, et al.
Published: (2025)

Detection and Analysis of Offensive Online Content in Hausa Language
by: Adam, Fatima Muhammad, et al.
Published: (2023)