Guardado en:
| Autor principal: | Ajayi, Edward |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2601.15297 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
por: Olatunji, Tobi, et al.
Publicado: (2024)
por: Olatunji, Tobi, et al.
Publicado: (2024)
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
por: Quan, Yinzhu, et al.
Publicado: (2024)
por: Quan, Yinzhu, et al.
Publicado: (2024)
AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages
por: Uemura, Kosei, et al.
Publicado: (2025)
por: Uemura, Kosei, et al.
Publicado: (2025)
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
por: Wang, Jiayi, et al.
Publicado: (2023)
por: Wang, Jiayi, et al.
Publicado: (2023)
EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments
por: Liu, Zefang, et al.
Publicado: (2025)
por: Liu, Zefang, et al.
Publicado: (2025)
EconCausal: A Context-Aware Economic Reasoning Benchmark for Large Language Models
por: Lee, Donggyu, et al.
Publicado: (2025)
por: Lee, Donggyu, et al.
Publicado: (2025)
EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents
por: Fish, Sara, et al.
Publicado: (2025)
por: Fish, Sara, et al.
Publicado: (2025)
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
por: Muhammad, Shamsuddeen Hassan, et al.
Publicado: (2025)
por: Muhammad, Shamsuddeen Hassan, et al.
Publicado: (2025)
AfriNLLB: Efficient Translation Models for African Languages
por: Moslem, Yasmin, et al.
Publicado: (2026)
por: Moslem, Yasmin, et al.
Publicado: (2026)
AfriHG: News headline generation for African Languages
por: Ogunremi, Toyib, et al.
Publicado: (2024)
por: Ogunremi, Toyib, et al.
Publicado: (2024)
EconNLI: Evaluating Large Language Models on Economics Reasoning
por: Guo, Yue, et al.
Publicado: (2024)
por: Guo, Yue, et al.
Publicado: (2024)
AfriVox-v2: A Domain-Verticalized Benchmark for In-the-Wild African Speech Recognition
por: Awobade, Busayo, et al.
Publicado: (2026)
por: Awobade, Busayo, et al.
Publicado: (2026)
AfriVoices-KE: A Multilingual Speech Dataset for Kenyan Languages
por: Wanzare, Lilian, et al.
Publicado: (2026)
por: Wanzare, Lilian, et al.
Publicado: (2026)
Afri-MCQA: Multimodal Cultural Question Answering for African Languages
por: Tonja, Atnafu Lambebo, et al.
Publicado: (2026)
por: Tonja, Atnafu Lambebo, et al.
Publicado: (2026)
AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR
por: Ashungafac, Gabrial Zencha, et al.
Publicado: (2025)
por: Ashungafac, Gabrial Zencha, et al.
Publicado: (2025)
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
por: Li, Mukai, et al.
Publicado: (2025)
por: Li, Mukai, et al.
Publicado: (2025)
HumorRank: A Tournament-Based Leaderboard for Evaluating Humor Generation in Large Language Models
por: Ajayi, Edward, et al.
Publicado: (2026)
por: Ajayi, Edward, et al.
Publicado: (2026)
AfriHuBERT: A self-supervised speech representation model for African languages
por: Alabi, Jesujoba O., et al.
Publicado: (2024)
por: Alabi, Jesujoba O., et al.
Publicado: (2024)
HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation
por: Ajayi, Edward, et al.
Publicado: (2026)
por: Ajayi, Edward, et al.
Publicado: (2026)
HumMusQA: A Human-written Music Understanding QA Benchmark Dataset
por: Weck, Benno, et al.
Publicado: (2026)
por: Weck, Benno, et al.
Publicado: (2026)
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
por: Kweon, Sunjun, et al.
Publicado: (2024)
por: Kweon, Sunjun, et al.
Publicado: (2024)
AfriStereo: A Culturally Grounded Dataset for Evaluating Stereotypical Bias in Large Language Models
por: Beux, Yann Le, et al.
Publicado: (2025)
por: Beux, Yann Le, et al.
Publicado: (2025)
Language Diversity: Evaluating Language Usage and AI Performance on African Languages in Digital Spaces
por: Ajayi, Edward, et al.
Publicado: (2025)
por: Ajayi, Edward, et al.
Publicado: (2025)
RJUA-QA: A Comprehensive QA Dataset for Urology
por: Lyu, Shiwei, et al.
Publicado: (2023)
por: Lyu, Shiwei, et al.
Publicado: (2023)
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
por: Li, Zongxi, et al.
Publicado: (2025)
por: Li, Zongxi, et al.
Publicado: (2025)
K-QA: A Real-World Medical Q&A Benchmark
por: Manes, Itay, et al.
Publicado: (2024)
por: Manes, Itay, et al.
Publicado: (2024)
SwaQuAD-24: QA Benchmark Dataset in Swahili
por: Kondoro, Alfred Malengo
Publicado: (2024)
por: Kondoro, Alfred Malengo
Publicado: (2024)
Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa
por: Sani, Sani Abdullahi, et al.
Publicado: (2025)
por: Sani, Sani Abdullahi, et al.
Publicado: (2025)
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs
por: Uluoglakci, Cem, et al.
Publicado: (2024)
por: Uluoglakci, Cem, et al.
Publicado: (2024)
AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation
por: Abdulmumin, Idris, et al.
Publicado: (2026)
por: Abdulmumin, Idris, et al.
Publicado: (2026)
Benchmarking On-Device Machine Learning on Apple Silicon with MLX
por: Ajayi, Oluwaseun A., et al.
Publicado: (2025)
por: Ajayi, Oluwaseun A., et al.
Publicado: (2025)
BioGraphletQA: Knowledge-Anchored Generation of Complex QA Datasets
por: Jonker, Richard A. A., et al.
Publicado: (2026)
por: Jonker, Richard A. A., et al.
Publicado: (2026)
KoSimpleQA: A Korean Factuality Benchmark with an Analysis of Reasoning LLMs
por: Ko, Donghyeon, et al.
Publicado: (2025)
por: Ko, Donghyeon, et al.
Publicado: (2025)
MobQA: A Benchmark Dataset for Semantic Understanding of Human Mobility Data through Question Answering
por: Asano, Hikaru, et al.
Publicado: (2025)
por: Asano, Hikaru, et al.
Publicado: (2025)
HistoryBankQA: Multilingual Temporal Question Answering on Historical Events
por: Mandal, Biswadip, et al.
Publicado: (2025)
por: Mandal, Biswadip, et al.
Publicado: (2025)
A Machine Learning Approach for Detection of Mental Health Conditions and Cyberbullying from Social Media
por: Ajayi, Edward, et al.
Publicado: (2025)
por: Ajayi, Edward, et al.
Publicado: (2025)
HausaMovieReview: A Benchmark Dataset for Sentiment Analysis in Low-Resource African Language
por: Zanga, Asiya Ibrahim, et al.
Publicado: (2025)
por: Zanga, Asiya Ibrahim, et al.
Publicado: (2025)
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
por: Monteiro, Joao, et al.
Publicado: (2024)
por: Monteiro, Joao, et al.
Publicado: (2024)
JBE-QA: Japanese Bar Exam QA Dataset for Assessing Legal Domain Knowledge
por: Cao, Zhihan, et al.
Publicado: (2025)
por: Cao, Zhihan, et al.
Publicado: (2025)
ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios
por: Pan, Changzai, et al.
Publicado: (2026)
por: Pan, Changzai, et al.
Publicado: (2026)
Ejemplares similares
-
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
por: Olatunji, Tobi, et al.
Publicado: (2024) -
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
por: Quan, Yinzhu, et al.
Publicado: (2024) -
AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages
por: Uemura, Kosei, et al.
Publicado: (2025) -
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
por: Wang, Jiayi, et al.
Publicado: (2023) -
EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments
por: Liu, Zefang, et al.
Publicado: (2025)