Saved in:
| Main Authors: | Li, Dawei, Sun, Renliang, Huang, Yue, Zhong, Ming, Jiang, Bohan, Han, Jiawei, Zhang, Xiangliang, Wang, Wei, Liu, Huan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.01534 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
by: Li, Dawei, et al.
Published: (2024)
by: Li, Dawei, et al.
Published: (2024)
Investigating Data Contamination for Pre-training Language Models
by: Jiang, Minhao, et al.
Published: (2024)
by: Jiang, Minhao, et al.
Published: (2024)
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
by: Sun, Yifan, et al.
Published: (2025)
by: Sun, Yifan, et al.
Published: (2025)
Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era
by: Li, Dawei, et al.
Published: (2025)
by: Li, Dawei, et al.
Published: (2025)
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
by: Stephan, Andreas, et al.
Published: (2024)
by: Stephan, Andreas, et al.
Published: (2024)
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
by: Zhao, Chengshuai, et al.
Published: (2025)
by: Zhao, Chengshuai, et al.
Published: (2025)
How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence
by: Choi, Hyeong Kyu, et al.
Published: (2025)
by: Choi, Hyeong Kyu, et al.
Published: (2025)
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
by: Liao, Zeyi, et al.
Published: (2024)
by: Liao, Zeyi, et al.
Published: (2024)
Vibe Checker: Aligning Code Evaluation with Human Preference
by: Zhong, Ming, et al.
Published: (2025)
by: Zhong, Ming, et al.
Published: (2025)
Less is More: Improving LLM Alignment via Preference Data Selection
by: Deng, Xun, et al.
Published: (2025)
by: Deng, Xun, et al.
Published: (2025)
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
by: Zhong, Ming, et al.
Published: (2023)
by: Zhong, Ming, et al.
Published: (2023)
Are Today's LLMs Ready to Explain Well-Being Concepts?
by: Jiang, Bohan, et al.
Published: (2025)
by: Jiang, Bohan, et al.
Published: (2025)
Causally-Enhanced Reinforcement Policy Optimization
by: Wang, Xiangqi, et al.
Published: (2025)
by: Wang, Xiangqi, et al.
Published: (2025)
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
by: Guo, Taicheng, et al.
Published: (2026)
by: Guo, Taicheng, et al.
Published: (2026)
Silicon Bureaucracy and AI Test-Oriented Education: Contamination Sensitivity and Score Confidence in LLM Benchmarks
by: Song, Yiliang, et al.
Published: (2026)
by: Song, Yiliang, et al.
Published: (2026)
Dual Optimal: Make Your LLM Peer-like with Dignity
by: Wang, Xiangqi, et al.
Published: (2026)
by: Wang, Xiangqi, et al.
Published: (2026)
Advancing LLM Reasoning Generalists with Preference Trees
by: Yuan, Lifan, et al.
Published: (2024)
by: Yuan, Lifan, et al.
Published: (2024)
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
by: Mahdavi, Sadegh, et al.
Published: (2025)
by: Mahdavi, Sadegh, et al.
Published: (2025)
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization
by: Zhou, Chenxi, et al.
Published: (2026)
by: Zhou, Chenxi, et al.
Published: (2026)
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
by: Chen, Guoxuan, et al.
Published: (2024)
by: Chen, Guoxuan, et al.
Published: (2024)
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
by: White, Colin, et al.
Published: (2024)
by: White, Colin, et al.
Published: (2024)
Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs
by: Han, Pengrui, et al.
Published: (2026)
by: Han, Pengrui, et al.
Published: (2026)
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
by: Gao, Binxin, et al.
Published: (2025)
by: Gao, Binxin, et al.
Published: (2025)
LLMAP: LLM-Assisted Multi-Objective Route Planning with User Preferences
by: Yuan, Liangqi, et al.
Published: (2025)
by: Yuan, Liangqi, et al.
Published: (2025)
When Wording Steers the Evaluation: Framing Bias in LLM judges
by: Hwang, Yerin, et al.
Published: (2026)
by: Hwang, Yerin, et al.
Published: (2026)
On the Role of Preference Variance in Preference Optimization
by: Guo, Jiacheng, et al.
Published: (2025)
by: Guo, Jiacheng, et al.
Published: (2025)
AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents
by: Hu, Lingxiang, et al.
Published: (2026)
by: Hu, Lingxiang, et al.
Published: (2026)
RouteLLM: Learning to Route LLMs with Preference Data
by: Ong, Isaac, et al.
Published: (2024)
by: Ong, Isaac, et al.
Published: (2024)
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
by: Fan, Chongyu, et al.
Published: (2024)
by: Fan, Chongyu, et al.
Published: (2024)
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
by: Kim, Dongyoung, et al.
Published: (2024)
by: Kim, Dongyoung, et al.
Published: (2024)
Catching Chameleons: Detecting Evolving Disinformation Generated using Large Language Models
by: Jiang, Bohan, et al.
Published: (2024)
by: Jiang, Bohan, et al.
Published: (2024)
Benchmarking Benchmark Leakage in Large Language Models
by: Xu, Ruijie, et al.
Published: (2024)
by: Xu, Ruijie, et al.
Published: (2024)
Self-Play Preference Optimization for Language Model Alignment
by: Wu, Yue, et al.
Published: (2024)
by: Wu, Yue, et al.
Published: (2024)
AttributionBench: How Hard is Automatic Attribution Evaluation?
by: Li, Yifei, et al.
Published: (2024)
by: Li, Yifei, et al.
Published: (2024)
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark
by: Zhao, Qihao, et al.
Published: (2024)
by: Zhao, Qihao, et al.
Published: (2024)
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
by: Zhang, Yuheng, et al.
Published: (2025)
by: Zhang, Yuheng, et al.
Published: (2025)
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level
by: Liu, Jie, et al.
Published: (2024)
by: Liu, Jie, et al.
Published: (2024)
Joint Detection of Fraud and Concept Drift inOnline Conversations with LLM-Assisted Judgment
by: Senol, Ali, et al.
Published: (2025)
by: Senol, Ali, et al.
Published: (2025)
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
by: Wu, Mingqi, et al.
Published: (2025)
by: Wu, Mingqi, et al.
Published: (2025)
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
by: Wang, Haoxiang, et al.
Published: (2024)
by: Wang, Haoxiang, et al.
Published: (2024)
Similar Items
-
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
by: Li, Dawei, et al.
Published: (2024) -
Investigating Data Contamination for Pre-training Language Models
by: Jiang, Minhao, et al.
Published: (2024) -
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
by: Sun, Yifan, et al.
Published: (2025) -
Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era
by: Li, Dawei, et al.
Published: (2025) -
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
by: Stephan, Andreas, et al.
Published: (2024)