Saved in:
| Main Authors: | Clarke, Christopher, Daynauth, Roland, Wilkinson, Charlene, Devonish, Hubert, Mars, Jason |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.03832 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Aligning Model Evaluations with Human Preferences: Mitigating Token Count Bias in Language Model Assessments
by: Daynauth, Roland, et al.
Published: (2024)
by: Daynauth, Roland, et al.
Published: (2024)
Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat
by: Daynauth, Roland, et al.
Published: (2024)
by: Daynauth, Roland, et al.
Published: (2024)
SLMEval: Entropy-Based Calibration for Human-Aligned Evaluation of Large Language Models
by: Daynauth, Roland, et al.
Published: (2025)
by: Daynauth, Roland, et al.
Published: (2025)
PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization
by: Clarke, Christopher, et al.
Published: (2024)
by: Clarke, Christopher, et al.
Published: (2024)
CreoleVal: Multilingual Multitask Benchmarks for Creoles
by: Lent, Heather, et al.
Published: (2023)
by: Lent, Heather, et al.
Published: (2023)
AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts
by: Milička, Jiří, et al.
Published: (2025)
by: Milička, Jiří, et al.
Published: (2025)
Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios
by: Joshi, Aditya, et al.
Published: (2024)
by: Joshi, Aditya, et al.
Published: (2024)
Multilingual and Explainable Text Detoxification with Parallel Corpora
by: Dementieva, Daryna, et al.
Published: (2024)
by: Dementieva, Daryna, et al.
Published: (2024)
Attributing Culture-Conditioned Generations to Pretraining Corpora
by: Li, Huihan, et al.
Published: (2024)
by: Li, Huihan, et al.
Published: (2024)
Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs
by: Friedman, Scott E., et al.
Published: (2024)
by: Friedman, Scott E., et al.
Published: (2024)
Beyond Line-Level Filtering for the Pretraining Corpora of LLMs
by: Park, Chanwoo, et al.
Published: (2025)
by: Park, Chanwoo, et al.
Published: (2025)
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs
by: Basu, Kinjal, et al.
Published: (2024)
by: Basu, Kinjal, et al.
Published: (2024)
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
by: Xu, Yuemei, et al.
Published: (2024)
by: Xu, Yuemei, et al.
Published: (2024)
AncientBench: Towards Comprehensive Evaluation on Excavated and Transmitted Chinese Corpora
by: Zhou, Zhihan, et al.
Published: (2025)
by: Zhou, Zhihan, et al.
Published: (2025)
Obscuring Data Contamination Through Translation: Evidence from Arabic Corpora
by: Abbas, Chaymaa, et al.
Published: (2026)
by: Abbas, Chaymaa, et al.
Published: (2026)
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora
by: Majurski, Michael, et al.
Published: (2025)
by: Majurski, Michael, et al.
Published: (2025)
A First Context-Free Grammar Applied to Nawatl Corpora Augmentation
by: Guzmán-Landa, Juan-José, et al.
Published: (2025)
by: Guzmán-Landa, Juan-José, et al.
Published: (2025)
Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora
by: Hennara, Khalil, et al.
Published: (2025)
by: Hennara, Khalil, et al.
Published: (2025)
Low-Resource, High-Impact: Building Corpora for Inclusive Language Technologies
by: Artemova, Ekaterina, et al.
Published: (2025)
by: Artemova, Ekaterina, et al.
Published: (2025)
SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora
by: Qarah, Faisal
Published: (2024)
by: Qarah, Faisal
Published: (2024)
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora
by: Shen, Yingli, et al.
Published: (2025)
by: Shen, Yingli, et al.
Published: (2025)
Mitigating Stylistic Biases of Machine Translation Systems via Monolingual Corpora Only
by: Gao, Xuanqi, et al.
Published: (2025)
by: Gao, Xuanqi, et al.
Published: (2025)
Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
by: Kersting, Nicholas S., et al.
Published: (2026)
by: Kersting, Nicholas S., et al.
Published: (2026)
MegaMath: Pushing the Limits of Open Math Corpora
by: Zhou, Fan, et al.
Published: (2025)
by: Zhou, Fan, et al.
Published: (2025)
Hope Speech Detection in Social Media English Corpora: Performance of Traditional and Transformer Models
by: Ramos, Luis, et al.
Published: (2025)
by: Ramos, Luis, et al.
Published: (2025)
GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages
by: Gyamfi, Lawrence Adu, et al.
Published: (2026)
by: Gyamfi, Lawrence Adu, et al.
Published: (2026)
Preference Consistency Matters: Enhancing Preference Learning in Language Models with Automated Self-Curation of Training Corpora
by: Lee, JoonHo, et al.
Published: (2024)
by: Lee, JoonHo, et al.
Published: (2024)
CorIL: Towards Enriching Indian Language to Indian Language Parallel Corpora and Machine Translation Systems
by: Bhattacharjee, Soham, et al.
Published: (2025)
by: Bhattacharjee, Soham, et al.
Published: (2025)
Discovering Multi-Scale Semantic Structure in Text Corpora Using Density-Based Trees and LLM Embeddings
by: Haschka, Thomas, et al.
Published: (2025)
by: Haschka, Thomas, et al.
Published: (2025)
Enhancing Document-Level Machine Translation via Filtered Synthetic Corpora and Two-Stage LLM Adaptation
by: Kim, Ireh, et al.
Published: (2026)
by: Kim, Ireh, et al.
Published: (2026)
MTP: A Meaning-Typed Language Abstraction for AI-Integrated Programming
by: Dantanarayana, Jayanaka L., et al.
Published: (2024)
by: Dantanarayana, Jayanaka L., et al.
Published: (2024)
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
by: Kim, Yungi, et al.
Published: (2024)
by: Kim, Yungi, et al.
Published: (2024)
EmbGen: Teaching with Reassembled Corpora
by: Lenin, Arun K, et al.
Published: (2026)
by: Lenin, Arun K, et al.
Published: (2026)
AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora
by: Bai, Jiaxin, et al.
Published: (2025)
by: Bai, Jiaxin, et al.
Published: (2025)
The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora
by: Amiraz, Chen, et al.
Published: (2025)
by: Amiraz, Chen, et al.
Published: (2025)
OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora
by: Flynt, Jeffrey
Published: (2026)
by: Flynt, Jeffrey
Published: (2026)
Semi-automated Fact-checking in Portuguese: Corpora Enrichment using Retrieval with Claim extraction
by: Gomes, Juliana Resplande Sant'anna, et al.
Published: (2025)
by: Gomes, Juliana Resplande Sant'anna, et al.
Published: (2025)
Building a Chinese Medical Dialogue System: Integrating Large-scale Corpora and Novel Models
by: Wang, Xinyuan, et al.
Published: (2024)
by: Wang, Xinyuan, et al.
Published: (2024)
What Makes a Reward Model a Good Teacher? An Optimization Perspective
by: Razin, Noam, et al.
Published: (2025)
by: Razin, Noam, et al.
Published: (2025)
Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting
by: Mühlenbernd, Roland
Published: (2026)
by: Mühlenbernd, Roland
Published: (2026)
Similar Items
-
Aligning Model Evaluations with Human Preferences: Mitigating Token Count Bias in Language Model Assessments
by: Daynauth, Roland, et al.
Published: (2024) -
Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat
by: Daynauth, Roland, et al.
Published: (2024) -
SLMEval: Entropy-Based Calibration for Human-Aligned Evaluation of Large Language Models
by: Daynauth, Roland, et al.
Published: (2025) -
PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization
by: Clarke, Christopher, et al.
Published: (2024) -
CreoleVal: Multilingual Multitask Benchmarks for Creoles
by: Lent, Heather, et al.
Published: (2023)