Saved in:
| Main Authors: | Kocyigit, Muhammed Yusuf, Yildirim, Caglar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.06103 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
by: Kocyigit, Muhammed Yusuf, et al.
Published: (2025)
by: Kocyigit, Muhammed Yusuf, et al.
Published: (2025)
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
by: Tao, Yongding, et al.
Published: (2025)
by: Tao, Yongding, et al.
Published: (2025)
Investigating Data Contamination for Pre-training Language Models
by: Jiang, Minhao, et al.
Published: (2024)
by: Jiang, Minhao, et al.
Published: (2024)
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
by: Gwak, Minju, et al.
Published: (2026)
by: Gwak, Minju, et al.
Published: (2026)
Search-Time Data Contamination
by: Han, Ziwen, et al.
Published: (2025)
by: Han, Ziwen, et al.
Published: (2025)
In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning
by: Terekhov, Mikhail, et al.
Published: (2024)
by: Terekhov, Mikhail, et al.
Published: (2024)
Impact of Inaccurate Contamination Ratio on Robust Unsupervised Anomaly Detection
by: Masakuna, Jordan F., et al.
Published: (2024)
by: Masakuna, Jordan F., et al.
Published: (2024)
Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure
by: Yildirim, Caglar
Published: (2026)
by: Yildirim, Caglar
Published: (2026)
Deep Positive-Unlabeled Anomaly Detection for Contaminated Unlabeled Data
by: Takahashi, Hiroshi, et al.
Published: (2024)
by: Takahashi, Hiroshi, et al.
Published: (2024)
RADAR: Mechanistic Pathways for Detecting Data Contamination in LLM Evaluation
by: Kattamuri, Ashish, et al.
Published: (2025)
by: Kattamuri, Ashish, et al.
Published: (2025)
Anomaly Detection with Adaptive and Aggressive Rejection for Contaminated Training Data
by: Lee, Jungi, et al.
Published: (2025)
by: Lee, Jungi, et al.
Published: (2025)
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
by: Golchin, Shahriar, et al.
Published: (2023)
by: Golchin, Shahriar, et al.
Published: (2023)
The Role of Deep Learning Regularizations on Actors in Offline RL
by: Tarasov, Denis, et al.
Published: (2024)
by: Tarasov, Denis, et al.
Published: (2024)
TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models
by: Li, Hongkai, et al.
Published: (2026)
by: Li, Hongkai, et al.
Published: (2026)
Prioritized Replay for RL Post-training
by: Fatemi, Mehdi
Published: (2026)
by: Fatemi, Mehdi
Published: (2026)
Efficient Learning of Fuzzy Logic Systems for Large-Scale Data Using Deep Learning
by: Koklu, Ata, et al.
Published: (2024)
by: Koklu, Ata, et al.
Published: (2024)
Can Generative Artificial Intelligence Survive Data Contamination? Theoretical Guarantees under Contaminated Recursive Training
by: Wang, Kevin, et al.
Published: (2026)
by: Wang, Kevin, et al.
Published: (2026)
Generative Modeling of Networked Time-Series via Transformer Architectures
by: Elnady, Yusuf
Published: (2025)
by: Elnady, Yusuf
Published: (2025)
AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
by: Jing, Yi, et al.
Published: (2026)
by: Jing, Yi, et al.
Published: (2026)
BoA: Attention-aware Post-training Quantization without Backpropagation
by: Kim, Junhan, et al.
Published: (2024)
by: Kim, Junhan, et al.
Published: (2024)
Bayesian Kolmogorov Arnold Networks (Bayesian_KANs): A Probabilistic Approach to Enhance Accuracy and Interpretability
by: Hassan, Masoud Muhammed
Published: (2024)
by: Hassan, Masoud Muhammed
Published: (2024)
Automatic Pair Construction for Contrastive Post-training
by: Xu, Canwen, et al.
Published: (2023)
by: Xu, Canwen, et al.
Published: (2023)
The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation
by: Lan, Yifan, et al.
Published: (2026)
by: Lan, Yifan, et al.
Published: (2026)
A Generic Machine Learning Framework for Fully-Unsupervised Anomaly Detection with Contaminated Data
by: Ulmer, Markus, et al.
Published: (2023)
by: Ulmer, Markus, et al.
Published: (2023)
Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them
by: Jin, Jiahe, et al.
Published: (2025)
by: Jin, Jiahe, et al.
Published: (2025)
AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training
by: Zhang, Huishuai, et al.
Published: (2025)
by: Zhang, Huishuai, et al.
Published: (2025)
Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena
by: Luo, Haipeng, et al.
Published: (2024)
by: Luo, Haipeng, et al.
Published: (2024)
Towards Effective Theory of LLMs: A Representation Learning Approach
by: Ustaomeroglu, Muhammed, et al.
Published: (2026)
by: Ustaomeroglu, Muhammed, et al.
Published: (2026)
RoCA: Robust Contrastive One-class Time Series Anomaly Detection with Contaminated Data
by: Mou, Xudong, et al.
Published: (2025)
by: Mou, Xudong, et al.
Published: (2025)
Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models
by: Mehta, Sushant
Published: (2025)
by: Mehta, Sushant
Published: (2025)
From Simulation to Enaction: Post-trained language models recognize and react to their own generations
by: G., Asvin, et al.
Published: (2026)
by: G., Asvin, et al.
Published: (2026)
PBP: Post-training Backdoor Purification for Malware Classifiers
by: Nguyen, Dung Thuy, et al.
Published: (2024)
by: Nguyen, Dung Thuy, et al.
Published: (2024)
Post-training for Efficient Communication via Convention Formation
by: Hua, Yilun, et al.
Published: (2025)
by: Hua, Yilun, et al.
Published: (2025)
Efficient Post-training Quantization with FP8 Formats
by: Shen, Haihao, et al.
Published: (2023)
by: Shen, Haihao, et al.
Published: (2023)
How Much Can We Forget about Data Contamination?
by: Bordt, Sebastian, et al.
Published: (2024)
by: Bordt, Sebastian, et al.
Published: (2024)
On the dimension of pullback attractors in recurrent neural networks
by: Fadera, Muhammed
Published: (2025)
by: Fadera, Muhammed
Published: (2025)
State Contamination in Memory-Augmented LLM Agents
by: Wang, Yian, et al.
Published: (2026)
by: Wang, Yian, et al.
Published: (2026)
Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
by: Hu, Pingbang, et al.
Published: (2026)
by: Hu, Pingbang, et al.
Published: (2026)
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
by: Wu, Mingqi, et al.
Published: (2025)
by: Wu, Mingqi, et al.
Published: (2025)
Similar Items
-
Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
by: Kocyigit, Muhammed Yusuf, et al.
Published: (2025) -
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
by: Tao, Yongding, et al.
Published: (2025) -
Investigating Data Contamination for Pre-training Language Models
by: Jiang, Minhao, et al.
Published: (2024) -
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
by: Gwak, Minju, et al.
Published: (2026) -
Search-Time Data Contamination
by: Han, Ziwen, et al.
Published: (2025)