Guardado en:
| Autores principales: | Toutou, Ammar, Harb, Abdelrahman, Basta, Christine |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.07453 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Neural Style Transfer for Synthesising a Dataset of Ancient Egyptian Hieroglyphs
por: Creed, Lewis Matheson
Publicado: (2025)
por: Creed, Lewis Matheson
Publicado: (2025)
HieroLM: Egyptian Hieroglyph Recovery with Next Word Prediction Language Model
por: Cai, Xuheng, et al.
Publicado: (2025)
por: Cai, Xuheng, et al.
Publicado: (2025)
DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following
por: Basta, Nardine, et al.
Publicado: (2026)
por: Basta, Nardine, et al.
Publicado: (2026)
Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
por: Luo, Fuwen, et al.
Publicado: (2026)
por: Luo, Fuwen, et al.
Publicado: (2026)
Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
por: Kocyigit, Muhammed Yusuf, et al.
Publicado: (2025)
por: Kocyigit, Muhammed Yusuf, et al.
Publicado: (2025)
Obscuring Data Contamination Through Translation: Evidence from Arabic Corpora
por: Abbas, Chaymaa, et al.
Publicado: (2026)
por: Abbas, Chaymaa, et al.
Publicado: (2026)
Bot Wars Evolved: Orchestrating Competing LLMs in a Counterstrike Against Phone Scams
por: Basta, Nardine, et al.
Publicado: (2025)
por: Basta, Nardine, et al.
Publicado: (2025)
Investigating the Impact of Data Contamination of Large Language Models in Text-to-SQL Translation
por: Ranaldi, Federico, et al.
Publicado: (2024)
por: Ranaldi, Federico, et al.
Publicado: (2024)
HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to English
por: Nasser, Ahmed, et al.
Publicado: (2025)
por: Nasser, Ahmed, et al.
Publicado: (2025)
When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation
por: Tan, David, et al.
Publicado: (2026)
por: Tan, David, et al.
Publicado: (2026)
A Reproducibility Study of PLAID
por: MacAvaney, Sean, et al.
Publicado: (2024)
por: MacAvaney, Sean, et al.
Publicado: (2024)
A Taxonomy for Data Contamination in Large Language Models
por: Palavalli, Medha, et al.
Publicado: (2024)
por: Palavalli, Medha, et al.
Publicado: (2024)
A Survey on Data Contamination for Large Language Models
por: Cheng, Yuxing, et al.
Publicado: (2025)
por: Cheng, Yuxing, et al.
Publicado: (2025)
Deterministic Reversible Data Augmentation for Neural Machine Translation
por: Yao, Jiashu, et al.
Publicado: (2024)
por: Yao, Jiashu, et al.
Publicado: (2024)
DCR: Quantifying Data Contamination in LLMs Evaluation
por: Xu, Cheng, et al.
Publicado: (2025)
por: Xu, Cheng, et al.
Publicado: (2025)
Benchmark Data Contamination of Large Language Models: A Survey
por: Xu, Cheng, et al.
Publicado: (2024)
por: Xu, Cheng, et al.
Publicado: (2024)
Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law
por: Kordjamshidi, Parisa, et al.
Publicado: (2026)
por: Kordjamshidi, Parisa, et al.
Publicado: (2026)
SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System
por: Sibaee, Serry, et al.
Publicado: (2025)
por: Sibaee, Serry, et al.
Publicado: (2025)
Promoting Target Data in Context-aware Neural Machine Translation
por: Gete, Harritxu, et al.
Publicado: (2024)
por: Gete, Harritxu, et al.
Publicado: (2024)
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
por: Golchin, Shahriar, et al.
Publicado: (2023)
por: Golchin, Shahriar, et al.
Publicado: (2023)
CAP: Data Contamination Detection via Consistency Amplification
por: Zhao, Yi, et al.
Publicado: (2024)
por: Zhao, Yi, et al.
Publicado: (2024)
An Empirical Study on the Robustness of Massively Multilingual Neural Machine Translation
por: Supryadi, et al.
Publicado: (2024)
por: Supryadi, et al.
Publicado: (2024)
Importance-Aware Data Augmentation for Document-Level Neural Machine Translation
por: Wu, Minghao, et al.
Publicado: (2024)
por: Wu, Minghao, et al.
Publicado: (2024)
Quantifying Data Contamination in Psychometric Evaluations of LLMs
por: Han, Jongwook, et al.
Publicado: (2025)
por: Han, Jongwook, et al.
Publicado: (2025)
Data Contamination Can Cross Language Barriers
por: Yao, Feng, et al.
Publicado: (2024)
por: Yao, Feng, et al.
Publicado: (2024)
ConvoCache: Smart Re-Use of Chatbot Responses
por: Atkins, Conor, et al.
Publicado: (2024)
por: Atkins, Conor, et al.
Publicado: (2024)
Reproducibility Study of Large Language Model Bayesian Optimization
por: Rychert, Adam, et al.
Publicado: (2025)
por: Rychert, Adam, et al.
Publicado: (2025)
A Reproducibility Study of LLM-Based Query Reformulation
por: Bigdeli, Amin, et al.
Publicado: (2026)
por: Bigdeli, Amin, et al.
Publicado: (2026)
MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models
por: Ahmed, Seif, et al.
Publicado: (2025)
por: Ahmed, Seif, et al.
Publicado: (2025)
A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning
por: Appicharla, Ramakrishna, et al.
Publicado: (2024)
por: Appicharla, Ramakrishna, et al.
Publicado: (2024)
A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis
por: Arfaoui, Heger, et al.
Publicado: (2025)
por: Arfaoui, Heger, et al.
Publicado: (2025)
Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation
por: Sánchez-Cartagena, Víctor M., et al.
Publicado: (2024)
por: Sánchez-Cartagena, Víctor M., et al.
Publicado: (2024)
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
por: Patel, Ajay, et al.
Publicado: (2024)
por: Patel, Ajay, et al.
Publicado: (2024)
Generator-Retriever-Generator Approach for Open-Domain Question Answering
por: Abdallah, Abdelrahman, et al.
Publicado: (2023)
por: Abdallah, Abdelrahman, et al.
Publicado: (2023)
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
por: Deng, Chunyuan, et al.
Publicado: (2024)
por: Deng, Chunyuan, et al.
Publicado: (2024)
EgMM-Corpus: A Multimodal Vision-Language Dataset for Egyptian Culture
por: Gamil, Mohamed, et al.
Publicado: (2025)
por: Gamil, Mohamed, et al.
Publicado: (2025)
Evaluating Robustness of LLMs in Question Answering on Multilingual Noisy OCR Data
por: Piryani, Bhawna, et al.
Publicado: (2025)
por: Piryani, Bhawna, et al.
Publicado: (2025)
Improving Retrieval-Augmented Neural Machine Translation with Monolingual Data
por: Bouthors, Maxime, et al.
Publicado: (2025)
por: Bouthors, Maxime, et al.
Publicado: (2025)
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
por: Fu, Yujuan, et al.
Publicado: (2024)
por: Fu, Yujuan, et al.
Publicado: (2024)
An Empirical Study on Chinese Character Decomposition in Multiword Expression-Aware Neural Machine Translation
por: Han, Lifeng, et al.
Publicado: (2025)
por: Han, Lifeng, et al.
Publicado: (2025)
Ejemplares similares
-
Neural Style Transfer for Synthesising a Dataset of Ancient Egyptian Hieroglyphs
por: Creed, Lewis Matheson
Publicado: (2025) -
HieroLM: Egyptian Hieroglyph Recovery with Next Word Prediction Language Model
por: Cai, Xuheng, et al.
Publicado: (2025) -
DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following
por: Basta, Nardine, et al.
Publicado: (2026) -
Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
por: Luo, Fuwen, et al.
Publicado: (2026) -
Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
por: Kocyigit, Muhammed Yusuf, et al.
Publicado: (2025)