:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kocyigit, Muhammed Yusuf, Yildirim, Caglar
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.06103
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
by: Kocyigit, Muhammed Yusuf, et al.
Published: (2025)

Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
by: Tao, Yongding, et al.
Published: (2025)

Investigating Data Contamination for Pre-training Language Models
by: Jiang, Minhao, et al.
Published: (2024)

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
by: Gwak, Minju, et al.
Published: (2026)

Search-Time Data Contamination
by: Han, Ziwen, et al.
Published: (2025)

In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning
by: Terekhov, Mikhail, et al.
Published: (2024)

Impact of Inaccurate Contamination Ratio on Robust Unsupervised Anomaly Detection
by: Masakuna, Jordan F., et al.
Published: (2024)

Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure
by: Yildirim, Caglar
Published: (2026)

Deep Positive-Unlabeled Anomaly Detection for Contaminated Unlabeled Data
by: Takahashi, Hiroshi, et al.
Published: (2024)

RADAR: Mechanistic Pathways for Detecting Data Contamination in LLM Evaluation
by: Kattamuri, Ashish, et al.
Published: (2025)

Anomaly Detection with Adaptive and Aggressive Rejection for Contaminated Training Data
by: Lee, Jungi, et al.
Published: (2025)

Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
by: Golchin, Shahriar, et al.
Published: (2023)

The Role of Deep Learning Regularizations on Actors in Offline RL
by: Tarasov, Denis, et al.
Published: (2024)

TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models
by: Li, Hongkai, et al.
Published: (2026)

Prioritized Replay for RL Post-training
by: Fatemi, Mehdi
Published: (2026)

Efficient Learning of Fuzzy Logic Systems for Large-Scale Data Using Deep Learning
by: Koklu, Ata, et al.
Published: (2024)

Can Generative Artificial Intelligence Survive Data Contamination? Theoretical Guarantees under Contaminated Recursive Training
by: Wang, Kevin, et al.
Published: (2026)

Generative Modeling of Networked Time-Series via Transformer Architectures
by: Elnady, Yusuf
Published: (2025)

AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
by: Jing, Yi, et al.
Published: (2026)

BoA: Attention-aware Post-training Quantization without Backpropagation
by: Kim, Junhan, et al.
Published: (2024)

Bayesian Kolmogorov Arnold Networks (Bayesian_KANs): A Probabilistic Approach to Enhance Accuracy and Interpretability
by: Hassan, Masoud Muhammed
Published: (2024)

Automatic Pair Construction for Contrastive Post-training
by: Xu, Canwen, et al.
Published: (2023)

The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation
by: Lan, Yifan, et al.
Published: (2026)

A Generic Machine Learning Framework for Fully-Unsupervised Anomaly Detection with Contaminated Data
by: Ulmer, Markus, et al.
Published: (2023)

Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them
by: Jin, Jiahe, et al.
Published: (2025)

AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training
by: Zhang, Huishuai, et al.
Published: (2025)

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena
by: Luo, Haipeng, et al.
Published: (2024)

Towards Effective Theory of LLMs: A Representation Learning Approach
by: Ustaomeroglu, Muhammed, et al.
Published: (2026)

RoCA: Robust Contrastive One-class Time Series Anomaly Detection with Contaminated Data
by: Mou, Xudong, et al.
Published: (2025)

Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models
by: Mehta, Sushant
Published: (2025)

From Simulation to Enaction: Post-trained language models recognize and react to their own generations
by: G., Asvin, et al.
Published: (2026)

PBP: Post-training Backdoor Purification for Malware Classifiers
by: Nguyen, Dung Thuy, et al.
Published: (2024)

Post-training for Efficient Communication via Convention Formation
by: Hua, Yilun, et al.
Published: (2025)

Efficient Post-training Quantization with FP8 Formats
by: Shen, Haihao, et al.
Published: (2023)

How Much Can We Forget about Data Contamination?
by: Bordt, Sebastian, et al.
Published: (2024)

On the dimension of pullback attractors in recurrent neural networks
by: Fadera, Muhammed
Published: (2025)

State Contamination in Memory-Augmented LLM Agents
by: Wang, Yian, et al.
Published: (2026)

Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
by: Hu, Pingbang, et al.
Published: (2026)

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
by: Wu, Mingqi, et al.
Published: (2025)