Saved in:
| Main Authors: | Shen, Yiyang, Tu, Lifu, Wang, Weiran |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.02621 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
by: Whitehouse, Chenxi, et al.
Published: (2025)
by: Whitehouse, Chenxi, et al.
Published: (2025)
JudgeBench: A Benchmark for Evaluating LLM-based Judges
by: Tan, Sijun, et al.
Published: (2024)
by: Tan, Sijun, et al.
Published: (2024)
Self-Distilled Agentic Reinforcement Learning
by: Lu, Zhengxi, et al.
Published: (2026)
by: Lu, Zhengxi, et al.
Published: (2026)
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
by: Xu, Ran, et al.
Published: (2025)
by: Xu, Ran, et al.
Published: (2025)
Quantitative LLM Judges
by: Sahoo, Aishwarya, et al.
Published: (2025)
by: Sahoo, Aishwarya, et al.
Published: (2025)
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
by: Zhou, Yilun, et al.
Published: (2025)
by: Zhou, Yilun, et al.
Published: (2025)
KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning
by: Xu, Hongling, et al.
Published: (2025)
by: Xu, Hongling, et al.
Published: (2025)
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
by: Shi, Taiwei, et al.
Published: (2025)
by: Shi, Taiwei, et al.
Published: (2025)
One Token to Fool LLM-as-a-Judge
by: Zhao, Yulai, et al.
Published: (2025)
by: Zhao, Yulai, et al.
Published: (2025)
Can I understand what I create? Self-Knowledge Evaluation of Large Language Models
by: Tan, Zhiquan, et al.
Published: (2024)
by: Tan, Zhiquan, et al.
Published: (2024)
Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation
by: Nguyen, Hieu, et al.
Published: (2025)
by: Nguyen, Hieu, et al.
Published: (2025)
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
by: Qi, Jingyuan, et al.
Published: (2023)
by: Qi, Jingyuan, et al.
Published: (2023)
Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering
by: Zhang, Nonghai, et al.
Published: (2026)
by: Zhang, Nonghai, et al.
Published: (2026)
StagePilot: A Deep Reinforcement Learning Agent for Stage-Controlled Cybergrooming Simulation
by: An, Heajun, et al.
Published: (2026)
by: An, Heajun, et al.
Published: (2026)
Deep Learning-based Method for Expressing Knowledge Boundary of Black-Box LLM
by: Sheng, Haotian, et al.
Published: (2026)
by: Sheng, Haotian, et al.
Published: (2026)
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
by: Xu, Austin, et al.
Published: (2025)
by: Xu, Austin, et al.
Published: (2025)
Brewing Knowledge in Context: Distillation Perspectives on In-Context Learning
by: Li, Chengye, et al.
Published: (2025)
by: Li, Chengye, et al.
Published: (2025)
Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
by: Yang, Xuewei, et al.
Published: (2026)
by: Yang, Xuewei, et al.
Published: (2026)
Self-Supervised Learning for Neural Topic Models with Variance-Invariance-Covariance Regularization
by: Xu, Weiran, et al.
Published: (2025)
by: Xu, Weiran, et al.
Published: (2025)
LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
by: Yang, Runming, et al.
Published: (2024)
by: Yang, Runming, et al.
Published: (2024)
DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
by: Chen, Jennifer, et al.
Published: (2025)
by: Chen, Jennifer, et al.
Published: (2025)
Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels
by: Pangakis, Nicholas, et al.
Published: (2024)
by: Pangakis, Nicholas, et al.
Published: (2024)
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
by: Ren, Yuxin, et al.
Published: (2023)
by: Ren, Yuxin, et al.
Published: (2023)
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
by: Li, Zhuochun, et al.
Published: (2026)
by: Li, Zhuochun, et al.
Published: (2026)
Knowledge Distillation with Training Wheels
by: Liu, Guanlin, et al.
Published: (2025)
by: Liu, Guanlin, et al.
Published: (2025)
Enhancing LLM Knowledge Learning through Generalization
by: Zhu, Mingkang, et al.
Published: (2025)
by: Zhu, Mingkang, et al.
Published: (2025)
How to Correctly Report LLM-as-a-Judge Evaluations
by: Lee, Chungpa, et al.
Published: (2025)
by: Lee, Chungpa, et al.
Published: (2025)
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
by: Liang, Xiao, et al.
Published: (2025)
by: Liang, Xiao, et al.
Published: (2025)
Investigating Non-Transitivity in LLM-as-a-Judge
by: Xu, Yi, et al.
Published: (2025)
by: Xu, Yi, et al.
Published: (2025)
DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation
by: Li, Pingzhi, et al.
Published: (2025)
by: Li, Pingzhi, et al.
Published: (2025)
Sinkhorn Distance Minimization for Knowledge Distillation
by: Cui, Xiao, et al.
Published: (2024)
by: Cui, Xiao, et al.
Published: (2024)
From Rubrics to Reliable Scores: Evidence-Grounded Text Evaluation with LLM Judges
by: Hong, Yihan, et al.
Published: (2026)
by: Hong, Yihan, et al.
Published: (2026)
J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge
by: Chan, Chi-Min, et al.
Published: (2025)
by: Chan, Chi-Min, et al.
Published: (2025)
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
by: Jung, Jaehun, et al.
Published: (2024)
by: Jung, Jaehun, et al.
Published: (2024)
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
by: Zhang, Zhengze, et al.
Published: (2025)
by: Zhang, Zhengze, et al.
Published: (2025)
All You Need is One: Capsule Prompt Tuning with a Single Vector
by: Liu, Yiyang, et al.
Published: (2025)
by: Liu, Yiyang, et al.
Published: (2025)
A Semi-supervised Generative Model for Incomplete Multi-view Data Integration with Missing Labels
by: Shen, Yiyang, et al.
Published: (2025)
by: Shen, Yiyang, et al.
Published: (2025)
Knowledge Editing on Black-box Large Language Models
by: Song, Xiaoshuai, et al.
Published: (2024)
by: Song, Xiaoshuai, et al.
Published: (2024)
AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens
by: Li, Tung-Ling, et al.
Published: (2025)
by: Li, Tung-Ling, et al.
Published: (2025)
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
by: Wei, Lai, et al.
Published: (2025)
by: Wei, Lai, et al.
Published: (2025)
Similar Items
-
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
by: Whitehouse, Chenxi, et al.
Published: (2025) -
JudgeBench: A Benchmark for Evaluating LLM-based Judges
by: Tan, Sijun, et al.
Published: (2024) -
Self-Distilled Agentic Reinforcement Learning
by: Lu, Zhengxi, et al.
Published: (2026) -
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
by: Xu, Ran, et al.
Published: (2025) -
Quantitative LLM Judges
by: Sahoo, Aishwarya, et al.
Published: (2025)