:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Toutou, Ammar, Harb, Abdelrahman, Basta, Christine
Formato:	Preprint
Publicado:	2026
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2605.07453
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Neural Style Transfer for Synthesising a Dataset of Ancient Egyptian Hieroglyphs
por: Creed, Lewis Matheson
Publicado: (2025)

HieroLM: Egyptian Hieroglyph Recovery with Next Word Prediction Language Model
por: Cai, Xuheng, et al.
Publicado: (2025)

DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following
por: Basta, Nardine, et al.
Publicado: (2026)

Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
por: Luo, Fuwen, et al.
Publicado: (2026)

Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
por: Kocyigit, Muhammed Yusuf, et al.
Publicado: (2025)

Obscuring Data Contamination Through Translation: Evidence from Arabic Corpora
por: Abbas, Chaymaa, et al.
Publicado: (2026)

Bot Wars Evolved: Orchestrating Competing LLMs in a Counterstrike Against Phone Scams
por: Basta, Nardine, et al.
Publicado: (2025)

Investigating the Impact of Data Contamination of Large Language Models in Text-to-SQL Translation
por: Ranaldi, Federico, et al.
Publicado: (2024)

HieroGlyphTranslator: Automatic Recognition and Translation of Egyptian Hieroglyphs to English
por: Nasser, Ahmed, et al.
Publicado: (2025)

When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation
por: Tan, David, et al.
Publicado: (2026)

A Reproducibility Study of PLAID
por: MacAvaney, Sean, et al.
Publicado: (2024)

A Taxonomy for Data Contamination in Large Language Models
por: Palavalli, Medha, et al.
Publicado: (2024)

A Survey on Data Contamination for Large Language Models
por: Cheng, Yuxing, et al.
Publicado: (2025)

Deterministic Reversible Data Augmentation for Neural Machine Translation
por: Yao, Jiashu, et al.
Publicado: (2024)

DCR: Quantifying Data Contamination in LLMs Evaluation
por: Xu, Cheng, et al.
Publicado: (2025)

Benchmark Data Contamination of Large Language Models: A Survey
por: Xu, Cheng, et al.
Publicado: (2024)

Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law
por: Kordjamshidi, Parisa, et al.
Publicado: (2026)

SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System
por: Sibaee, Serry, et al.
Publicado: (2025)

Promoting Target Data in Context-aware Neural Machine Translation
por: Gete, Harritxu, et al.
Publicado: (2024)

Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
por: Golchin, Shahriar, et al.
Publicado: (2023)

CAP: Data Contamination Detection via Consistency Amplification
por: Zhao, Yi, et al.
Publicado: (2024)

An Empirical Study on the Robustness of Massively Multilingual Neural Machine Translation
por: Supryadi, et al.
Publicado: (2024)

Importance-Aware Data Augmentation for Document-Level Neural Machine Translation
por: Wu, Minghao, et al.
Publicado: (2024)

Quantifying Data Contamination in Psychometric Evaluations of LLMs
por: Han, Jongwook, et al.
Publicado: (2025)

Data Contamination Can Cross Language Barriers
por: Yao, Feng, et al.
Publicado: (2024)

ConvoCache: Smart Re-Use of Chatbot Responses
por: Atkins, Conor, et al.
Publicado: (2024)

Reproducibility Study of Large Language Model Bayesian Optimization
por: Rychert, Adam, et al.
Publicado: (2025)

A Reproducibility Study of LLM-Based Query Reformulation
por: Bigdeli, Amin, et al.
Publicado: (2026)

MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models
por: Ahmed, Seif, et al.
Publicado: (2025)

A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning
por: Appicharla, Ramakrishna, et al.
Publicado: (2024)

A Reproducible Framework for Neural Topic Modeling in Focus Group Analysis
por: Arfaoui, Heger, et al.
Publicado: (2025)

Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation
por: Sánchez-Cartagena, Víctor M., et al.
Publicado: (2024)

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
por: Patel, Ajay, et al.
Publicado: (2024)

Generator-Retriever-Generator Approach for Open-Domain Question Answering
por: Abdallah, Abdelrahman, et al.
Publicado: (2023)

Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
por: Deng, Chunyuan, et al.
Publicado: (2024)

EgMM-Corpus: A Multimodal Vision-Language Dataset for Egyptian Culture
por: Gamil, Mohamed, et al.
Publicado: (2025)

Evaluating Robustness of LLMs in Question Answering on Multilingual Noisy OCR Data
por: Piryani, Bhawna, et al.
Publicado: (2025)

Improving Retrieval-Augmented Neural Machine Translation with Monolingual Data
por: Bouthors, Maxime, et al.
Publicado: (2025)

Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions
por: Fu, Yujuan, et al.
Publicado: (2024)

An Empirical Study on Chinese Character Decomposition in Multiword Expression-Aware Neural Machine Translation
por: Han, Lifeng, et al.
Publicado: (2025)