Saved in:
| Main Authors: | Zaghouani, Wajdi, Aldous, Kholoud K., Gao, Yicheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.29667 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian
by: Zaghouani, Wajdi, et al.
Published: (2026)
by: Zaghouani, Wajdi, et al.
Published: (2026)
EmoHopeSpeech: An Annotated Dataset of Emotions and Hope Speech in English and Arabic
by: Zaghouani, Wajdi, et al.
Published: (2025)
by: Zaghouani, Wajdi, et al.
Published: (2025)
Toward Responsible and Epistemically Grounded Multilingual LLMs for Computational Social Science and Humanities
by: Zaghouani, Wajdi
Published: (2026)
by: Zaghouani, Wajdi
Published: (2026)
Building Arabic NLP from the Ground Up: Twenty Years of Lessons, Failures, and Open Problems
by: Zaghouani, Wajdi
Published: (2026)
by: Zaghouani, Wajdi
Published: (2026)
Cultural Adaptation in Large Language Models for Political Discourse
by: Zaghouani, Wajdi
Published: (2026)
by: Zaghouani, Wajdi
Published: (2026)
An Annotated Corpus of Arabic Tweets for Hate Speech Analysis
by: Zaghouani, Wajdi, et al.
Published: (2025)
by: Zaghouani, Wajdi, et al.
Published: (2025)
AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse
by: Sharqawi, Esra'a, et al.
Published: (2026)
by: Sharqawi, Esra'a, et al.
Published: (2026)
Chinese Offensive Language Detection:Current Status and Future Directions
by: Xiao, Yunze, et al.
Published: (2024)
by: Xiao, Yunze, et al.
Published: (2024)
KZ-SafetyPrompts: A Kazakh Safety Evaluation Prompt Dataset for Large Language Models
by: Zaghouani, Wajdi, et al.
Published: (2026)
by: Zaghouani, Wajdi, et al.
Published: (2026)
MARSAD: A Multi-Functional Tool for Real-Time Social Media Analysis
by: Biswas, Md. Rafiul, et al.
Published: (2025)
by: Biswas, Md. Rafiul, et al.
Published: (2025)
ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization
by: Zaghouani, Wajdi, et al.
Published: (2026)
by: Zaghouani, Wajdi, et al.
Published: (2026)
Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse
by: Al-Athba, Aisha Ali, et al.
Published: (2026)
by: Al-Athba, Aisha Ali, et al.
Published: (2026)
MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification
by: Biswas, Md Rafiul, et al.
Published: (2024)
by: Biswas, Md Rafiul, et al.
Published: (2024)
Nullpointer at CheckThat! 2024: Identifying Subjectivity from Multilingual Text Sequence
by: Biswas, Md. Rafiul, et al.
Published: (2024)
by: Biswas, Md. Rafiul, et al.
Published: (2024)
ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination
by: Zaghouani, Wajdi, et al.
Published: (2026)
by: Zaghouani, Wajdi, et al.
Published: (2026)
JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media
by: Zaghouani, Wajdi, et al.
Published: (2026)
by: Zaghouani, Wajdi, et al.
Published: (2026)
Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus
by: Zaghouani, Wajdi, et al.
Published: (2026)
by: Zaghouani, Wajdi, et al.
Published: (2026)
Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs
by: Alam, Firoj, et al.
Published: (2024)
by: Alam, Firoj, et al.
Published: (2024)
Transformers and Ensemble methods: A solution for Hate Speech Detection in Arabic languages
by: de Paula, Angel Felipe Magnossão, et al.
Published: (2023)
by: de Paula, Angel Felipe Magnossão, et al.
Published: (2023)
ThatiAR: Subjectivity Detection in Arabic News Sentences
by: Suwaileh, Reem, et al.
Published: (2024)
by: Suwaileh, Reem, et al.
Published: (2024)
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
by: Zhang, Hengxiang, et al.
Published: (2024)
by: Zhang, Hengxiang, et al.
Published: (2024)
Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains
by: Ramesh, Krithika, et al.
Published: (2024)
by: Ramesh, Krithika, et al.
Published: (2024)
MultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safety
by: Song, Jialin, et al.
Published: (2026)
by: Song, Jialin, et al.
Published: (2026)
Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains
by: Chu, Xu, et al.
Published: (2025)
by: Chu, Xu, et al.
Published: (2025)
How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains
by: Khanmohammadi, Reza, et al.
Published: (2026)
by: Khanmohammadi, Reza, et al.
Published: (2026)
SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains
by: Ramesh, Krithika, et al.
Published: (2025)
by: Ramesh, Krithika, et al.
Published: (2025)
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
by: Xu, Zhichao, et al.
Published: (2024)
by: Xu, Zhichao, et al.
Published: (2024)
The FIGNEWS Shared Task on News Media Narratives
by: Zaghouani, Wajdi, et al.
Published: (2024)
by: Zaghouani, Wajdi, et al.
Published: (2024)
CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage
by: Wei, Bowen, et al.
Published: (2025)
by: Wei, Bowen, et al.
Published: (2025)
MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating Chinese and English Computational Language Models
by: Zhang, Yunhao, et al.
Published: (2024)
by: Zhang, Yunhao, et al.
Published: (2024)
ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain
by: Zhao, Haochen, et al.
Published: (2024)
by: Zhao, Haochen, et al.
Published: (2024)
A Novel Evaluation Benchmark for Medical LLMs: Illuminating Safety and Effectiveness in Clinical Domains
by: Wang, Shirui, et al.
Published: (2025)
by: Wang, Shirui, et al.
Published: (2025)
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation
by: Wang, Shuting, et al.
Published: (2024)
by: Wang, Shuting, et al.
Published: (2024)
McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models
by: Lan, Tian, et al.
Published: (2025)
by: Lan, Tian, et al.
Published: (2025)
ArAIEval Shared Task: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content
by: Hasanain, Maram, et al.
Published: (2024)
by: Hasanain, Maram, et al.
Published: (2024)
RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios
by: Zhao, Fei, et al.
Published: (2025)
by: Zhao, Fei, et al.
Published: (2025)
Hidden Measurement Error in LLM Pipelines Distorts Annotation, Evaluation, and Benchmarking
by: Messing, Solomon
Published: (2026)
by: Messing, Solomon
Published: (2026)
EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A
by: Ma, Shijian, et al.
Published: (2026)
by: Ma, Shijian, et al.
Published: (2026)
Bayesian Calibration of Win Rate Estimation with LLM Evaluators
by: Gao, Yicheng, et al.
Published: (2024)
by: Gao, Yicheng, et al.
Published: (2024)
PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech
by: Piskala, Deepak Babu
Published: (2025)
by: Piskala, Deepak Babu
Published: (2025)
Similar Items
-
AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian
by: Zaghouani, Wajdi, et al.
Published: (2026) -
EmoHopeSpeech: An Annotated Dataset of Emotions and Hope Speech in English and Arabic
by: Zaghouani, Wajdi, et al.
Published: (2025) -
Toward Responsible and Epistemically Grounded Multilingual LLMs for Computational Social Science and Humanities
by: Zaghouani, Wajdi
Published: (2026) -
Building Arabic NLP from the Ground Up: Twenty Years of Lessons, Failures, and Open Problems
by: Zaghouani, Wajdi
Published: (2026) -
Cultural Adaptation in Large Language Models for Political Discourse
by: Zaghouani, Wajdi
Published: (2026)