:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Miao, Yongliang, Liang, Yangyang, Du, Mengnan
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2601.08097
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models
by: Liu, Weiqi, et al.
Published: (2026)

Debate Helps Weak Judges Reward Stronger Models
by: Elasky, Ethan, et al.
Published: (2026)

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
by: Duo, Jiangshan, et al.
Published: (2026)

AutoJudge: Judge Decoding Without Manual Annotation
by: Garipov, Roman, et al.
Published: (2025)

Judge Circuits
by: Feldhus, Nils, et al.
Published: (2026)

Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
by: Ye, Ziyi, et al.
Published: (2024)

Quantitative LLM Judges
by: Sahoo, Aishwarya, et al.
Published: (2025)

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
by: Chen, Zhaorun, et al.
Published: (2024)

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
by: Zhou, Yilun, et al.
Published: (2025)

JudgeBench: A Benchmark for Evaluating LLM-based Judges
by: Tan, Sijun, et al.
Published: (2024)

Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
by: Bachmann, Gregor, et al.
Published: (2025)

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
by: Li, Zhuochun, et al.
Published: (2026)

One Token to Fool LLM-as-a-Judge
by: Zhao, Yulai, et al.
Published: (2025)

JAF: Judge Agent Forest
by: Garg, Sahil, et al.
Published: (2026)

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
by: Xu, Austin, et al.
Published: (2025)

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining
by: Dong, Hongyuan, et al.
Published: (2025)

SAE-FiRE: Enhancing Earnings Surprise Predictions Through Sparse Autoencoder Feature Selection
by: Zhang, Huopu, et al.
Published: (2025)

The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction
by: Zambrano, Guillaume
Published: (2025)

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
by: Zhang, Wenbo, et al.
Published: (2026)

AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens
by: Li, Tung-Ling, et al.
Published: (2025)

Learning to Judge: LLMs Designing and Applying Evaluation Rubrics
by: Siro, Clemencia, et al.
Published: (2026)

Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning
by: Shu, Dong, et al.
Published: (2024)

AdaSplash: Adaptive Sparse Flash Attention
by: Gonçalves, Nuno, et al.
Published: (2025)

Approximating Human Preferences Using a Multi-Judge Learned System
by: Sprejer, Eitán, et al.
Published: (2025)

Benchmarks Saturate When The Model Gets Smarter Than The Judge
by: Ballon, Marthe, et al.
Published: (2026)

CodeJudge: Evaluating Code Generation with Large Language Models
by: Tong, Weixi, et al.
Published: (2024)

Reinforcement Learning-based Knowledge Distillation with LLM-as-a-Judge
by: Shen, Yiyang, et al.
Published: (2026)

Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
by: Sutawika, Lintang, et al.
Published: (2026)

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
by: Jung, Jaehun, et al.
Published: (2024)

How to Correctly Report LLM-as-a-Judge Evaluations
by: Lee, Chungpa, et al.
Published: (2025)

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
by: Wang, Zhilin, et al.
Published: (2025)

On Evaluating LLM Alignment by Evaluating LLMs as Judges
by: Liu, Yixin, et al.
Published: (2025)

The Perfect Blend: Redefining RLHF with Mixture of Judges
by: Xu, Tengyu, et al.
Published: (2024)

Investigating Non-Transitivity in LLM-as-a-Judge
by: Xu, Yi, et al.
Published: (2025)

Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
by: Hu, Jingyu, et al.
Published: (2024)

Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
by: Wei, Zhipeng, et al.
Published: (2024)

Training an LLM-as-a-Judge Model: Pipeline, Insights, and Practical Lessons
by: Hu, Renjun, et al.
Published: (2025)

AdaLomo: Low-memory Optimization with Adaptive Learning Rate
by: Lv, Kai, et al.
Published: (2023)

Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants
by: Herrera, Alejandro Breen, et al.
Published: (2026)

Quantifying and Mitigating Self-Preference Bias of LLM Judges
by: Yang, Jinming, et al.
Published: (2026)