:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shen, Yiyang, Tu, Lifu, Wang, Weiran
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2604.02621
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
by: Whitehouse, Chenxi, et al.
Published: (2025)

JudgeBench: A Benchmark for Evaluating LLM-based Judges
by: Tan, Sijun, et al.
Published: (2024)

Self-Distilled Agentic Reinforcement Learning
by: Lu, Zhengxi, et al.
Published: (2026)

Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
by: Xu, Ran, et al.
Published: (2025)

Quantitative LLM Judges
by: Sahoo, Aishwarya, et al.
Published: (2025)

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
by: Zhou, Yilun, et al.
Published: (2025)

KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning
by: Xu, Hongling, et al.
Published: (2025)

Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
by: Shi, Taiwei, et al.
Published: (2025)

One Token to Fool LLM-as-a-Judge
by: Zhao, Yulai, et al.
Published: (2025)

Can I understand what I create? Self-Knowledge Evaluation of Large Language Models
by: Tan, Zhiquan, et al.
Published: (2024)

Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation
by: Nguyen, Hieu, et al.
Published: (2025)

MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
by: Qi, Jingyuan, et al.
Published: (2023)

Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering
by: Zhang, Nonghai, et al.
Published: (2026)

StagePilot: A Deep Reinforcement Learning Agent for Stage-Controlled Cybergrooming Simulation
by: An, Heajun, et al.
Published: (2026)

Deep Learning-based Method for Expressing Knowledge Boundary of Black-Box LLM
by: Sheng, Haotian, et al.
Published: (2026)

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
by: Xu, Austin, et al.
Published: (2025)

Brewing Knowledge in Context: Distillation Perspectives on In-Context Learning
by: Li, Chengye, et al.
Published: (2025)

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
by: Yang, Xuewei, et al.
Published: (2026)

Self-Supervised Learning for Neural Topic Models with Variance-Invariance-Covariance Regularization
by: Xu, Weiran, et al.
Published: (2025)

LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
by: Yang, Runming, et al.
Published: (2024)

DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
by: Chen, Jennifer, et al.
Published: (2025)

Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels
by: Pangakis, Nicholas, et al.
Published: (2024)

Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
by: Ren, Yuxin, et al.
Published: (2023)

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
by: Li, Zhuochun, et al.
Published: (2026)

Knowledge Distillation with Training Wheels
by: Liu, Guanlin, et al.
Published: (2025)

Enhancing LLM Knowledge Learning through Generalization
by: Zhu, Mingkang, et al.
Published: (2025)

How to Correctly Report LLM-as-a-Judge Evaluations
by: Lee, Chungpa, et al.
Published: (2025)

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
by: Liang, Xiao, et al.
Published: (2025)

Investigating Non-Transitivity in LLM-as-a-Judge
by: Xu, Yi, et al.
Published: (2025)

DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation
by: Li, Pingzhi, et al.
Published: (2025)

Sinkhorn Distance Minimization for Knowledge Distillation
by: Cui, Xiao, et al.
Published: (2024)

From Rubrics to Reliable Scores: Evidence-Grounded Text Evaluation with LLM Judges
by: Hong, Yihan, et al.
Published: (2026)

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge
by: Chan, Chi-Min, et al.
Published: (2025)

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
by: Jung, Jaehun, et al.
Published: (2024)

daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
by: Zhang, Zhengze, et al.
Published: (2025)

All You Need is One: Capsule Prompt Tuning with a Single Vector
by: Liu, Yiyang, et al.
Published: (2025)

A Semi-supervised Generative Model for Incomplete Multi-view Data Integration with Missing Labels
by: Shen, Yiyang, et al.
Published: (2025)

Knowledge Editing on Black-box Large Language Models
by: Song, Xiaoshuai, et al.
Published: (2024)

AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens
by: Li, Tung-Ling, et al.
Published: (2025)

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
by: Wei, Lai, et al.
Published: (2025)